Merge puzzletron compression algorithm#1121
Conversation
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…iv.org/abs/2411.19146 (#464) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do? Add decilm modelling code --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do? Compress tutorial (PoC) + compress cli app. --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com> Signed-off-by: Liana Mikaelyan <45925959+LianaMikael@users.noreply.github.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: Liana Mikaelyan <lmikaelyan@nvidia.com> Co-authored-by: Liana Mikaelyan <45925959+LianaMikael@users.noreply.github.com>
#545) ## What does this PR do? Add llama converter (no dependency on internal Nvidia code) - part 1/2 - change top-level dependencies in convert_llama3_to_decilm.py from puzzle_tools.... to modelopt..... - added modelopt.torch._compress.tools module - remove tokenization_mistral.py - not used scope of 2/2 part (will come once part 1/2 is merged): - change all deeper dependencies from from puzzle_tools.... to modelopt.... - test_convert_llama3_config_to_decilm_config.py should run without any internal nvidia dependencies --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…ia code) (#552) ## What does this PR do? llama converter is self-contained now (no dependency on internal nvidia code) --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do? Make score pruning activations self-contained (no dependency on internal Nvidia code) - 1/6 - add integration test for attention pruning (pre-step to make adding activation scoring safer) --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do? - Add score_pruning_activations.py Notes: - validate_model.py still depends on Nvidia internal code (will be changed in the subsequent MR) - sharded_checkpoint_utils.py - for now it needs to use DeciLM from internal Nvidia code, to be changed in the next MR --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do? Add activation hooks used for pruning --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…ng is self-contained now (#584) ## What does this PR do? Add sewing kit and utilities used for pruning scoring - pruning scoring is self-contained now - no dependency on internal Nvidia code. --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do?
- Add L2NormHook and use it in megatron.py
- Using L2NormHook removes code duplication between
_DynamicSelfAttention and _DynamicMLP
This is the first step towards reusing activation scores logic across
Minitron and Puzzle. Next steps:
- complete redesign of megatron.py - move other activation hooks logic
to hooks.py
- then combined those hooks.py with a similar hooks.py functoriality in
puzzle
(modelopt/torch/_compress/activation_scoring/activation_hooks/hooks.py)
Questions:
- why in the code before and after this redesign we store temp variables
in two ways _register_temp_attribute and self.hook_handle)?
```
self._register_temp_attribute("_activation_hook", activation_hook)
# TODO: confusion: why hook_handle is removed manually in export() and not using _register_temp_attribute?
self.hook_handle = self.linear_fc2.register_forward_hook(activation_hook)
```
---------
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do? Add pruning checkpoints for the compress algorithm. --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do? Add build replacement library to the compress algorithm. --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do? Add subblock stats to the compress algorithm. --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do? Add 1-block scoring to the compress algorithm. --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…utionHook (#610) ## What does this PR do? Add checkpoint save/load to ForwardHook + add IterativeChannelContributionHook. --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do? Add MIP step to the compress algorithm. --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do? **Type of change:** Improvement <!-- Use one of the following: Bug fix, new feature, new example, new tests, documentation. --> - Fix tests for 2-gpu: Some places hard-coded cpu device for distributed communications which was causing this issue - Remove unused constrain_search_space.py - Remove `is_multi_layer_puzzle: False` case - Remove `use_greedy_search: False` case - Remove knapsack mip case - Remove unused `num_solutions` and `minimal_diversity` flags ## Testing <!-- Mention how have you tested your change if applicable. --> - GH CICD test passing - Tested on 2-gpu setup locally as well <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit # Release Notes * **Refactor** * Optimized solver implementation with improved library integration. * Simplified model compression configuration by removing deprecated search options. * Consolidated optimization paths for streamlined processing. * **Chores** * Updated dependencies for improved compatibility. * **Documentation** * Clarified Model-Optimizer installation instructions in examples. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…hooks analysis (#670) ## What does this PR do? Fix a bug in IterativeChannelContributionHook + tools for activation hooks analysis --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…unctions (#667) ## What does this PR do? - Remove `runtime.py` and directly use `modelopt.torch.utils.distributed` functions - Read model_dtype and autocast_dtype from validate_model_defaults.yaml instead of runtime object - Remove more unused functions - Remove unnecessary parse_args for intermediate steps, and improve docstrings ## Testing <!-- Mention how have you tested your change if applicable. --> - No change in functionality - Existing tests passing --------- Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a new Puzzletron feature: an AnyModel-based heterogeneous LLM compression/NAS pipeline with converters, model descriptors, pruning hooks/mixins, activation scoring, MIP optimization, replacement libraries, pruning checkpoint tooling, evaluation/deployment adapters, and many example configs/docs/workflows. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI as Puzzletron CLI
participant Hydra as Hydra Config
participant Converter
participant Scorer as Activation Scoring
participant MIP as MIP Solver
participant Realize as Model Realizer/Checkpoint Writer
User->>CLI: run --config <hydra_cfg> [--mip-only]
CLI->>Hydra: load config (puzzle_dir, dataset_path, overrides)
CLI->>Converter: convert HF -> AnyModel subblocks (master only)
CLI->>Scorer: launch_score_activations (distributed)
Scorer-->>CLI: activations logs / checkpoints
CLI->>MIP: run_puzzle (uses activations, subblock stats, replacement_library)
MIP-->>Realize: chosen solutions
Realize-->>CLI: realized model checkpoints
CLI->>User: finished (paths to realized models)
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
✨ Finishing Touches🧪 Generate unit tests (beta)
|
|
There was a problem hiding this comment.
Actionable comments posted: 12
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
🟡 Minor comments (15)
modelopt/torch/puzzletron/tools/logger.py-164-168 (1)
164-168:⚠️ Potential issue | 🟡 MinorDocstring says "rank -1 in node 0" but implementation may not match.
The
lprintdocstring says "Last rank prints only (rank -1 in node 0)" but the implementation checkslocal_rank != world_size - 1. If the intent is the last rank globally, this logic appears incorrect. If the intent is the last local rank within a node, the comparison should be against local world size, not global world size.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/tools/logger.py` around lines 164 - 168, The docstring for lprint mismatches its rank-check logic; update either the docstring or the implementation in lprint to match the intended behavior: if you want "last rank in node 0" (last local rank) change the check to compare local_rank against local_world_size - 1 (use the local world size variable used elsewhere in logger or from torch/distributed) before calling logger.dist_log, or if you intended the global last rank adjust the docstring to "last global rank" and ensure the check compares local_rank against world_size - 1 accordingly; reference the lprint function and logger.dist_log to locate where to change the comparison and/or docstring.modelopt/torch/puzzletron/tools/logger.py-62-66 (1)
62-66:⚠️ Potential issue | 🟡 MinorError message is inconsistent with valid choices.
The error message on line 65 lists
['all', 'main', 'local_main']but'last'is also a valid choice per line 62.🐛 Proposed fix
if ranks not in ["all", "main", "local_main", "last"]: raise NotImplementedError( f"Could not broadcast msg {msg} - " - f"ranks parameters choices are ['all', 'main', 'local_main']. Got {ranks}" + f"ranks parameters choices are ['all', 'main', 'local_main', 'last']. Got {ranks}" )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/tools/logger.py` around lines 62 - 66, The NotImplementedError message for the ranks check mentions the wrong set of valid choices; update the f-string in the ranks validation (the raise NotImplementedError block that references the variable ranks and msg) to include 'last' in the displayed list (i.e., ["all", "main", "local_main", "last"]) so the error accurately reflects the valid options.examples/pruning/README.md-10-10 (1)
10-10:⚠️ Potential issue | 🟡 MinorFix compound adjective hyphenation in docs.
At Line 10, use “MIP-based” (not “MIP based”) for correct technical phrasing.
✏️ Proposed edit
-1. [Puzzletron](../puzzletron/README.md): An advanced pruning method by NVIDIA using Mixed Integer Programming (MIP) based NAS search algorithm. +1. [Puzzletron](../puzzletron/README.md): An advanced pruning method by NVIDIA using a Mixed Integer Programming (MIP)-based NAS search algorithm.As per coding guidelines, "
**/*.{py,cpp,h,hpp,md}: All code (Python, C++, Markdown, etc.) must adhere to coding standards as checked by pre-commit hooks".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/pruning/README.md` at line 10, Update the compound adjective "MIP based" in the Puzzletron README entry to the hyphenated form "MIP-based" so the phrase reads "MIP-based NAS search algorithm"; locate the text string "Puzzletron" / the line containing "MIP based NAS search algorithm" and replace it with "MIP-based" to comply with the Markdown/style pre-commit checks.examples/puzzletron/configs/gptoss-20b_remove_experts_memory/gptoss-20b_remove_experts_memory.yaml-14-17 (1)
14-17:⚠️ Potential issue | 🟡 MinorFix the mismatch between
target_memoryand its comment.
16_000MiB is about15.6GiB;45GiB would be46_080MiB. One of these is wrong, and the current ambiguity can make the example optimize for the wrong budget.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/puzzletron/configs/gptoss-20b_remove_experts_memory/gptoss-20b_remove_experts_memory.yaml` around lines 14 - 17, The mip.human_constraints.target_memory value (target_memory: 16_000) and its comment (“45 GiB”) disagree; update one to match the other: either set target_memory to 46_080 (MiB) if you mean 45 GiB, or change the comment to match 16_000 MiB (~15.6 GiB). Locate the target_memory key under mip -> human_constraints and make the numeric value and the GiB comment consistent.examples/puzzletron/configs/nemotron-nano-12b-v2/nemotron_nano_12b_v2_pruneffn_memory.yaml-14-17 (1)
14-17:⚠️ Potential issue | 🟡 MinorAlign the MiB value with the GiB comment.
90_000MiB is about87.9GiB, not90GiB. If the intended cap is 90 GiB, this should be92_160; otherwise update the comment so the search target is unambiguous.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/puzzletron/configs/nemotron-nano-12b-v2/nemotron_nano_12b_v2_pruneffn_memory.yaml` around lines 14 - 17, The target_memory under mip.human_constraints is inconsistent with the "90 GiB" comment; update the value of target_memory to 92_160 (MiB) to represent 90 GiB, or alternatively change the comment to the actual GiB represented by 90_000 MiB (~87.9 GiB); edit the target_memory key in this YAML (mip -> human_constraints -> target_memory) accordingly to keep value and comment aligned.examples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/llama-3_2-3B_pruneffn_memory.yaml-17-17 (1)
17-17:⚠️ Potential issue | 🟡 MinorMemory unit comment is inaccurate.
Line 17 uses
45_000MiB, which is ~43.95 GiB, not 45 GiB. Please update either the value or the comment to avoid misconfigured constraints.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/llama-3_2-3B_pruneffn_memory.yaml` at line 17, The comment next to the target_memory value is incorrect: target_memory: 45_000 is in MiB (~43.95 GiB), not 45 GiB; either set target_memory to 46_080 (45 * 1024) to represent 45 GiB exactly or update the inline comment to the correct GiB approximation (≈43.95 GiB); locate the target_memory entry and apply one of these fixes so the value and its comment match.modelopt/torch/puzzletron/anymodel/models/llama/llama_converter.py-36-52 (1)
36-52:⚠️ Potential issue | 🟡 MinorReturn type annotation mismatch: method declares
List[BlockConfig]but returnsList[dict].The method signature declares a return type of
List[BlockConfig], but line 50 calls.to_dict()on each BlockConfig instance, causing the method to actually returnList[dict]. Update the return type annotation to match the actual behavior.🔧 Suggested fix
`@staticmethod` - def create_block_configs_from_main_config(config: LlamaConfig) -> List[BlockConfig]: + def create_block_configs_from_main_config(config: LlamaConfig) -> List[dict]: """Create uniform block configs for all Llama layers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/anymodel/models/llama/llama_converter.py` around lines 36 - 52, The function create_block_configs_from_main_config currently builds BlockConfig objects but calls .to_dict(), so its declared return type List[BlockConfig] is incorrect; update the signature to return List[dict] (or List[MutableMapping] if you prefer an abstract type) to match the actual returned value, and ensure the typing import (List) is present/adjusted accordingly; reference create_block_configs_from_main_config, LlamaConfig, BlockConfig, and the .to_dict() call when making the change.modelopt/torch/puzzletron/anymodel/converter/converter_factory.py-43-47 (1)
43-47:⚠️ Potential issue | 🟡 MinorHandle potential
Nonefrominspect.getmodule().
inspect.getmodule(ref)can returnNoneif the module cannot be determined, which would cause anAttributeErrorwhen accessing.__name__.🛡️ Proposed fix
- raise KeyError( - f"Could not register `{cls_name}`: {cls_type}, " - f"`{cls_name}` is already registered and points to " - f"`{inspect.getmodule(ref).__name__}.{ref.__name__}`" - ) + module = inspect.getmodule(ref) + module_name = module.__name__ if module else "<unknown>" + raise KeyError( + f"Could not register `{cls_name}`: {cls_type}, " + f"`{cls_name}` is already registered and points to " + f"`{module_name}.{ref.__name__}`" + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/anymodel/converter/converter_factory.py` around lines 43 - 47, The KeyError message in converter_factory.py assumes inspect.getmodule(ref) always returns a module; update the error construction in the registration failure branch (where cls_name, cls_type and ref are used) to guard against None from inspect.getmodule(ref) by using a safe fallback (e.g., module = inspect.getmodule(ref); module_name = module.__name__ if module is not None else getattr(ref, "__module__", "unknown")) and use module_name rather than inspect.getmodule(ref).__name__; ensure ref.__name__ (or getattr(ref, "__name__", repr(ref))) is similarly accessed safely so the f-string does not raise AttributeError.modelopt/torch/puzzletron/puzzletron.py-60-74 (1)
60-74:⚠️ Potential issue | 🟡 MinorStep numbering is inconsistent.
Comments jump from "Step 2" to "Step 4", skipping "Step 3". This appears to be a typo or leftover from a refactor. Consider renumbering for clarity:
📝 Proposed fix
# Step 2: pruning_ckpts (single process) if dist.is_master(): pruning_ckpts.launch_prune_ckpt(hydra_cfg) dist.barrier() - # Step 4: build_library_and_stats (single process) + # Step 3: build_library_and_stats (single process) if dist.is_master(): build_library_and_stats.launch_build_library_and_stats(hydra_cfg) dist.barrier() - # Step 5: calc_one_block_scores (distributed processing) + # Step 4: calc_one_block_scores (distributed processing) scoring.launch_scoring(hydra_cfg) - # Step 6: mip_and_realize_models (distributed processing) + # Step 5: mip_and_realize_models (distributed processing) mip_and_realize_models.launch_mip_and_realize_model(hydra_cfg)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/puzzletron.py` around lines 60 - 74, The step comments are misnumbered (jump from "Step 2" to "Step 4"); update the inline step comments in puzzletron.py to use a consistent sequence (e.g., Step 2 → Step 3 → Step 4 etc.) around the blocks that call pruning_ckpts.launch_prune_ckpt, build_library_and_stats.launch_build_library_and_stats, scoring.launch_scoring, and mip_and_realize_models.launch_mip_and_realize_model so the step numbers reflect the actual order of execution.examples/puzzletron/configs/nemotron-nano-12b-v2/pruning/pruning_defaults.yaml-20-31 (1)
20-31:⚠️ Potential issue | 🟡 MinorRemove unused
ffn_listdefinition or align withintermediate_size_listusage.The file serves as a defaults template for Hydra configuration, but there's an inconsistency:
ffn_listis defined on line 21 with no value, yet the actual FFN pruning configuration (ffn_pruning.yaml) populates and usesintermediate_size_listinstead. Either renameffn_listtointermediate_size_listfor consistency, or remove the unusedffn_listdefinition.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/puzzletron/configs/nemotron-nano-12b-v2/pruning/pruning_defaults.yaml` around lines 20 - 31, The defaults file defines an unused key ffn_list while the FFN pruning logic uses intermediate_size_list (see ffn_pruning.yaml); fix by either removing the ffn_list entry entirely or renaming it to intermediate_size_list and ensuring the init mode key (mlp_init_mode) matches the expected name used by ffn_pruning.yaml so the Hydra defaults align with the pruning config.modelopt/torch/puzzletron/tools/hydra_utils.py-29-36 (1)
29-36:⚠️ Potential issue | 🟡 MinorValidate
blockandmbsbefore division inwarmup_steps.
block=0ormbs=0causes a runtime failure. Add explicit positive-value checks and fail early with a clear error.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/tools/hydra_utils.py` around lines 29 - 36, The warmup_steps function currently divides by block and mbs which will raise on zero; update warmup_steps to validate that block and mbs are positive integers (e.g., block > 0 and mbs > 0) at the top of the function and raise a clear ValueError if not; keep the existing logic (compute steps = (tokens//block)//mbs, compute w = pct * steps, return max(1, round(w))) but fail fast with an explanatory error mentioning the parameter names block and mbs so callers get a clear message when invalid values are passed.modelopt/torch/puzzletron/activation_scoring/activation_hooks/utils.py-46-47 (1)
46-47:⚠️ Potential issue | 🟡 MinorAvoid mutating caller-owned
activation_hooks_kwargs.This function modifies the input dict in place (
["model"] = model), which can leak state across calls. Copy it first before enrichment.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/activation_scoring/activation_hooks/utils.py` around lines 46 - 47, The code mutates the caller-owned dict activation_hooks_kwargs by setting activation_hooks_kwargs["model"] = model; instead, make a shallow copy of activation_hooks_kwargs (e.g., new_kwargs = activation_hooks_kwargs.copy() or dict(activation_hooks_kwargs)), add the "model" key to the copy, and use/return that copy so the original dict isn’t mutated; update any return/usage to use the new_kwargs variable.examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/Llama-3_1-8B.yaml-13-13 (1)
13-13:⚠️ Potential issue | 🟡 MinorMinor typo in comment.
ppathshould bepath.Proposed fix
-dataset_path: ??? # ppath to Nemotron-Post-Training-Dataset-v2 +dataset_path: ??? # path to Nemotron-Post-Training-Dataset-v2🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/Llama-3_1-8B.yaml` at line 13, Fix the typo in the YAML comment for the dataset key: change the comment after the dataset_path entry so it reads "path to Nemotron-Post-Training-Dataset-v2" instead of "ppath to Nemotron-Post-Training-Dataset-v2" so the comment for the dataset_path key is correct and clear.modelopt/torch/puzzletron/tools/validate_model.py-138-140 (1)
138-140:⚠️ Potential issue | 🟡 MinorPotential issue:
validation_full_iterscould be zero.If
args.eval_samplesis less thanargs.micro_batch_size, the integer division results invalidation_full_iters = 0. This could cause unexpected behavior in downstream code that expects at least one iteration.🛡️ Proposed fix: Add validation or use ceiling division
validation_full_iters = ( - args.eval_samples // args.micro_batch_size + max(1, args.eval_samples // args.micro_batch_size) ) # model pipeline, single data rankOr add an assertion:
validation_full_iters = ( args.eval_samples // args.micro_batch_size ) # model pipeline, single data rank + assert validation_full_iters > 0, f"eval_samples ({args.eval_samples}) must be >= micro_batch_size ({args.micro_batch_size})"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/tools/validate_model.py` around lines 138 - 140, The calculation of validation_full_iters can yield zero when args.eval_samples < args.micro_batch_size; update the logic around validation_full_iters in validate_model.py to ensure at least one validation iteration by either validating inputs (raise/exit if args.eval_samples < args.micro_batch_size) or computing a ceiling division and then taking max(1, ...); reference the variable validation_full_iters and the CLI args args.eval_samples and args.micro_batch_size when making the change so downstream loops that expect >=1 iteration do not break.modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py-316-321 (1)
316-321:⚠️ Potential issue | 🟡 MinorActually emit the missing-MoE-stats warning.
Warning(...)only instantiates an object; it does not notify the caller. Right now a missingmoe_stats.jsonsilently changesactive_paramsfrom MoE-aware counts to total params. Usewarnings.warn(...)ormprint(...)here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py` around lines 316 - 321, The code creates a pathlib object moe_stats_file and on missing file only instantiates Warning(...) which does nothing; update the block around moe_stats_file (using variables moe_stats_file, moe_stats_filename, master_puzzle_dir and the downstream active_params logic) to actually emit a warning by calling warnings.warn(...) (or mprint(...)) with a clear message like f"MOE stats file {moe_stats_file} does not exist, can't calculate num active params", and add `import warnings` at the top if using warnings.warn; keep setting moe_stats_file = None so the fallback to total params remains.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 31341e54-9727-48f6-b865-606bb7607464
⛔ Files ignored due to path filters (2)
examples/puzzletron/mip_sweep_example.pngis excluded by!**/*.pnguv.lockis excluded by!**/*.lock
📒 Files selected for processing (212)
.github/CODEOWNERS.github/workflows/example_tests.yml.github/workflows/gpu_tests.yml.pre-commit-config.yamlexamples/pruning/README.mdexamples/puzzletron/GPTOSS.mdexamples/puzzletron/README.mdexamples/puzzletron/configs/gptoss-20b_remove_experts_memory/gptoss-20b.yamlexamples/puzzletron/configs/gptoss-20b_remove_experts_memory/gptoss-20b_remove_experts_memory.yamlexamples/puzzletron/configs/gptoss-20b_remove_experts_memory/pruning/ffn_pruning.yamlexamples/puzzletron/configs/gptoss-20b_remove_experts_memory/pruning/pruning_defaults.yamlexamples/puzzletron/configs/gptoss-20b_remove_experts_memory/validate_model_defaults.yamlexamples/puzzletron/configs/gptoss-20b_remove_experts_memory/validate_solutions_defaults.yamlexamples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/Llama-3_1-8B.yamlexamples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/llama-3_1-8B_pruneffn_memory.yamlexamples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/pruning/attn_pruning.yamlexamples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/pruning/ffn_pruning.yamlexamples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/pruning/hidden_dim_pruning.yamlexamples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/pruning/pruning_defaults.yamlexamples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/validate_model_defaults.yamlexamples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/validate_solutions_defaults.yamlexamples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/Llama-3_2-3B.yamlexamples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/llama-3_2-3B_pruneffn_memory.yamlexamples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/pruning/ffn_pruning.yamlexamples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/pruning/pruning_defaults.yamlexamples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/validate_model_defaults.yamlexamples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/validate_solutions_defaults.yamlexamples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/Mistral-Small-24B.yamlexamples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/mistral-small-24b-instruct-2501_pruneffn_memory.yamlexamples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/pruning/attn_pruning.yamlexamples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/pruning/ffn_pruning.yamlexamples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/pruning/hidden_dim_pruning.yamlexamples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/pruning/pruning_defaults.yamlexamples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/validate_model_defaults.yamlexamples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/validate_solutions_defaults.yamlexamples/puzzletron/configs/nemotron-nano-12b-v2/nemotron_nano_12b_v2.yamlexamples/puzzletron/configs/nemotron-nano-12b-v2/nemotron_nano_12b_v2_pruneffn_memory.yamlexamples/puzzletron/configs/nemotron-nano-12b-v2/pruning/attn_pruning.yamlexamples/puzzletron/configs/nemotron-nano-12b-v2/pruning/ffn_pruning.yamlexamples/puzzletron/configs/nemotron-nano-12b-v2/pruning/hidden_dim_pruning.yamlexamples/puzzletron/configs/nemotron-nano-12b-v2/pruning/pruning_defaults.yamlexamples/puzzletron/configs/nemotron-nano-12b-v2/validate_model_defaults.yamlexamples/puzzletron/configs/nemotron-nano-12b-v2/validate_solutions_defaults.yamlexamples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/pruning/attn_pruning.yamlexamples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/pruning/ffn_pruning.yamlexamples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/pruning/hidden_dim_pruning.yamlexamples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/pruning/pruning_defaults.yamlexamples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/qwen2_5_7b_instruct.yamlexamples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/qwen2_5_7b_instruct_pruneffn_memory.yamlexamples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/validate_model_defaults.yamlexamples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/validate_solutions_defaults.yamlexamples/puzzletron/configs/qwen3-8b_pruneffn_memory/pruning/attn_pruning.yamlexamples/puzzletron/configs/qwen3-8b_pruneffn_memory/pruning/ffn_pruning.yamlexamples/puzzletron/configs/qwen3-8b_pruneffn_memory/pruning/hidden_dim_pruning.yamlexamples/puzzletron/configs/qwen3-8b_pruneffn_memory/pruning/pruning_defaults.yamlexamples/puzzletron/configs/qwen3-8b_pruneffn_memory/qwen3_8b.yamlexamples/puzzletron/configs/qwen3-8b_pruneffn_memory/qwen3_8b_pruneffn_memory.yamlexamples/puzzletron/configs/qwen3-8b_pruneffn_memory/validate_model_defaults.yamlexamples/puzzletron/configs/qwen3-8b_pruneffn_memory/validate_solutions_defaults.yamlexamples/puzzletron/evaluation/hf_deployable_anymodel.pyexamples/puzzletron/evaluation/lm_eval_anymodel.pyexamples/puzzletron/evaluation/nemo_evaluator_instructions.mdexamples/puzzletron/main.pyexamples/puzzletron/mbridge_distillation/README.mdexamples/puzzletron/mbridge_distillation/distill_hf.pyexamples/puzzletron/requirements.txtmodelopt/torch/prune/importance_hooks/__init__.pymodelopt/torch/prune/importance_hooks/base_hooks.pymodelopt/torch/prune/importance_hooks/expert_removal_hooks.pymodelopt/torch/puzzletron/README.mdmodelopt/torch/puzzletron/__init__.pymodelopt/torch/puzzletron/activation_scoring/activation_hooks/__init__.pymodelopt/torch/puzzletron/activation_scoring/activation_hooks/utils.pymodelopt/torch/puzzletron/activation_scoring/score_pruning_activations.pymodelopt/torch/puzzletron/anymodel/README.mdmodelopt/torch/puzzletron/anymodel/__init__.pymodelopt/torch/puzzletron/anymodel/converter/__init__.pymodelopt/torch/puzzletron/anymodel/converter/convert_any_model.pymodelopt/torch/puzzletron/anymodel/converter/converter.pymodelopt/torch/puzzletron/anymodel/converter/converter_factory.pymodelopt/torch/puzzletron/anymodel/model_descriptor/__init__.pymodelopt/torch/puzzletron/anymodel/model_descriptor/model_descriptor.pymodelopt/torch/puzzletron/anymodel/model_descriptor/model_descriptor_factory.pymodelopt/torch/puzzletron/anymodel/models/__init__.pymodelopt/torch/puzzletron/anymodel/models/gpt_oss/__init__.pymodelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_converter.pymodelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_pruned_to_mxfp4.pymodelopt/torch/puzzletron/anymodel/models/llama/__init__.pymodelopt/torch/puzzletron/anymodel/models/llama/llama_converter.pymodelopt/torch/puzzletron/anymodel/models/llama/llama_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/mistral_small/__init__.pymodelopt/torch/puzzletron/anymodel/models/mistral_small/mistral_small_converter.pymodelopt/torch/puzzletron/anymodel/models/mistral_small/mistral_small_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/nemotron_h/__init__.pymodelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_converter.pymodelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/__init__.pymodelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_converter.pymodelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/qwen2/__init__.pymodelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_converter.pymodelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/qwen3_8b/__init__.pymodelopt/torch/puzzletron/anymodel/models/qwen3_8b/qwen3_8b_converter.pymodelopt/torch/puzzletron/anymodel/models/qwen3_8b/qwen3_8b_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/qwen3_vl_30b_a3b_instruct/__init__.pymodelopt/torch/puzzletron/anymodel/models/qwen3_vl_30b_a3b_instruct/qwen3_vl_30b_a3b_instruct_converter.pymodelopt/torch/puzzletron/anymodel/models/qwen3_vl_30b_a3b_instruct/qwen3_vl_30b_a3b_instruct_model_descriptor.pymodelopt/torch/puzzletron/anymodel/puzzformer/__init__.pymodelopt/torch/puzzletron/anymodel/puzzformer/no_op.pymodelopt/torch/puzzletron/anymodel/puzzformer/utils.pymodelopt/torch/puzzletron/build_library_and_stats.pymodelopt/torch/puzzletron/dataset/__init__.pymodelopt/torch/puzzletron/dataset/prepare_dataset.pymodelopt/torch/puzzletron/decilm/deci_lm_hf_code/__init__.pymodelopt/torch/puzzletron/decilm/deci_lm_hf_code/block_config.pymodelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.pymodelopt/torch/puzzletron/export/mbridge/__init__.pymodelopt/torch/puzzletron/export/mbridge/base.pymodelopt/torch/puzzletron/export/mbridge/distillation_provider.pymodelopt/torch/puzzletron/export/mbridge/export_mbridge_to_hf.pymodelopt/torch/puzzletron/export/mbridge/llama.pymodelopt/torch/puzzletron/export/mbridge/qwen3.pymodelopt/torch/puzzletron/mip/mip_and_realize_models.pymodelopt/torch/puzzletron/mip/mip_with_multi_layer_replacements.pymodelopt/torch/puzzletron/mip/run_puzzle.pymodelopt/torch/puzzletron/mip/sweep.pymodelopt/torch/puzzletron/mip/utils.pymodelopt/torch/puzzletron/nas/plugins/puzzletron_nas_plugin.pymodelopt/torch/puzzletron/pruning/expert_removal_pruning_mixin.pymodelopt/torch/puzzletron/pruning/ffn_intermediate_pruning_mixin.pymodelopt/torch/puzzletron/pruning/kv_heads_pruning_mixin.pymodelopt/torch/puzzletron/pruning/pruning_ckpts.pymodelopt/torch/puzzletron/pruning/pruning_mixin.pymodelopt/torch/puzzletron/pruning/pruning_utils.pymodelopt/torch/puzzletron/puzzletron.pymodelopt/torch/puzzletron/replacement_library/build_replacement_library.pymodelopt/torch/puzzletron/replacement_library/replacement_library.pymodelopt/torch/puzzletron/replacement_library/replacement_utils.pymodelopt/torch/puzzletron/scoring/scoring.pymodelopt/torch/puzzletron/sewing_kit/__init__.pymodelopt/torch/puzzletron/sewing_kit/core.pymodelopt/torch/puzzletron/sewing_kit/passage/__init__.pymodelopt/torch/puzzletron/sewing_kit/passage/core.pymodelopt/torch/puzzletron/sewing_kit/utils.pymodelopt/torch/puzzletron/subblock_stats/calc_subblock_params_and_memory.pymodelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.pymodelopt/torch/puzzletron/tools/__init__.pymodelopt/torch/puzzletron/tools/bypassed_training/child_init.pymodelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.pymodelopt/torch/puzzletron/tools/checkpoint_utils.pymodelopt/torch/puzzletron/tools/checkpoint_utils_hf.pymodelopt/torch/puzzletron/tools/common.pymodelopt/torch/puzzletron/tools/hydra_utils.pymodelopt/torch/puzzletron/tools/kd_model.pymodelopt/torch/puzzletron/tools/logger.pymodelopt/torch/puzzletron/tools/post_init_sparse.pymodelopt/torch/puzzletron/tools/robust_json.pymodelopt/torch/puzzletron/tools/sharded_checkpoint_utils.pymodelopt/torch/puzzletron/tools/validate_model.pymodelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.pymodelopt/torch/puzzletron/tools/validation_utils.pymodelopt/torch/puzzletron/utils/checkpoint_manager.pymodelopt/torch/puzzletron/utils/data/dataloaders.pymodelopt/torch/puzzletron/utils/data/dataset.pymodelopt/torch/puzzletron/utils/dummy_modules.pymodelopt/torch/puzzletron/utils/parsing.pymodelopt/torch/puzzletron/utils/utils.pymodelopt/torch/puzzletron/utils/validate_runtime_pipeline.pymodelopt/torch/puzzletron/utils/validation.pymodelopt/torch/utils/robust_json.pypyproject.tomltests/_test_utils/torch/puzzletron/utils.pytests/examples/puzzletron/mbridge_distillation/test_distill_hf.pytests/gpu/torch/conftest.pytests/gpu/torch/puzzletron/conftest.pytests/gpu/torch/puzzletron/nas/plugins/test_nas_convert.pytests/gpu/torch/puzzletron/nas/plugins/test_nas_search.pytests/gpu/torch/puzzletron/resources/configs/Qwen/Qwen2.5-7B-Instruct/Qwen2.5-7B-Instruct.yamltests/gpu/torch/puzzletron/resources/configs/Qwen/Qwen2.5-7B-Instruct/pruning/ffn_pruning.yamltests/gpu/torch/puzzletron/resources/configs/Qwen/Qwen3-8B/Qwen3-8B.yamltests/gpu/torch/puzzletron/resources/configs/Qwen/Qwen3-8B/pruning/ffn_pruning.yamltests/gpu/torch/puzzletron/resources/configs/Qwen/Qwen3-VL-30B-A3B-Instruct/Qwen3-VL-30B-A3B-Instruct.yamltests/gpu/torch/puzzletron/resources/configs/Qwen/Qwen3-VL-30B-A3B-Instruct/pruning/expert_pruning.yamltests/gpu/torch/puzzletron/resources/configs/meta-llama/Llama-3.1-8B-Instruct/Llama-3.1-8B-Instruct-attn-pruning.yamltests/gpu/torch/puzzletron/resources/configs/meta-llama/Llama-3.1-8B-Instruct/Llama-3.1-8B-Instruct.yamltests/gpu/torch/puzzletron/resources/configs/meta-llama/Llama-3.1-8B-Instruct/pruning/attn_pruning.yamltests/gpu/torch/puzzletron/resources/configs/meta-llama/Llama-3.1-8B-Instruct/pruning/ffn_pruning.yamltests/gpu/torch/puzzletron/resources/configs/meta-llama/Llama-3.2-3B-Instruct/Llama-3.2-3B-Instruct.yamltests/gpu/torch/puzzletron/resources/configs/meta-llama/Llama-3.2-3B-Instruct/pruning/ffn_pruning.yamltests/gpu/torch/puzzletron/resources/configs/mistralai/Mistral-Small-24B-Instruct-2501/Mistral-Small-24B-Instruct-2501.yamltests/gpu/torch/puzzletron/resources/configs/mistralai/Mistral-Small-24B-Instruct-2501/pruning/ffn_pruning.yamltests/gpu/torch/puzzletron/resources/configs/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16.yamltests/gpu/torch/puzzletron/resources/configs/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16/pruning/expert_pruning.yamltests/gpu/torch/puzzletron/resources/configs/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16/pruning/ffn_pruning.yamltests/gpu/torch/puzzletron/resources/configs/nvidia/NVIDIA-Nemotron-Nano-12B-v2/NVIDIA-Nemotron-Nano-12B-v2.yamltests/gpu/torch/puzzletron/resources/configs/nvidia/NVIDIA-Nemotron-Nano-12B-v2/pruning/ffn_pruning.yamltests/gpu/torch/puzzletron/resources/configs/openai/gpt-oss-20b/gpt-oss-20b.yamltests/gpu/torch/puzzletron/resources/configs/openai/gpt-oss-20b/pruning/expert_removal.yamltests/gpu/torch/puzzletron/resources/configs/pruning/attn_pruning.yamltests/gpu/torch/puzzletron/resources/configs/pruning/ffn_pruning_base.yamltests/gpu/torch/puzzletron/resources/configs/pruning/hidden_dim_pruning.yamltests/gpu/torch/puzzletron/resources/configs/pruning/pruning_defaults.yamltests/gpu/torch/puzzletron/resources/configs/validate_model_defaults.yamltests/gpu/torch/puzzletron/resources/configs/validate_solutions_defaults.yamltests/gpu/torch/puzzletron/resources/tokenizer/special_tokens_map.jsontests/gpu/torch/puzzletron/resources/tokenizer/tokenizer.jsontests/gpu/torch/puzzletron/resources/tokenizer/tokenizer_config.jsontests/gpu/torch/puzzletron/resources/tokenizer/truncate_tokenizer.pytests/gpu/torch/puzzletron/test_puzzletron.pytox.ini
examples/puzzletron/configs/nemotron-nano-12b-v2/validate_solutions_defaults.yaml
Show resolved
Hide resolved
| process_group = node.target.process_group | ||
| peers = node.target.peer_rank | ||
| if not isinstance(peers, Sequence): | ||
| peers = [peers] | ||
|
|
||
| if len(values_to_node) > 0: | ||
| items_to_send = list(self.create_output_overrides(values_to_node).items()) | ||
|
|
||
| data_descriptors: list[RemoteDataDescriptor] = [] | ||
| tensors_to_send: list[torch.Tensor] = [] | ||
|
|
||
| for key, value in items_to_send: | ||
| if isinstance(value, torch.Tensor): | ||
| if value.is_cuda: | ||
| tensor_device = "cuda" | ||
| elif value.is_cpu: | ||
| tensor_device = "cpu" | ||
| else: | ||
| raise RuntimeError( | ||
| f"Invalid tensor device to send to remote target: {value.device}" | ||
| ) | ||
|
|
||
| data_descriptor = RemoteTensorDataDescriptor( | ||
| key=key, | ||
| device=tensor_device, | ||
| dtype=value.dtype, | ||
| shape=value.shape, | ||
| ) | ||
| tensors_to_send.append(value) | ||
|
|
||
| else: | ||
| data_descriptor = RemotePythonDataDescriptor( | ||
| key=key, | ||
| value=value, | ||
| ) | ||
|
|
||
| data_descriptors.append(data_descriptor) | ||
|
|
||
| works: list[Optional[torch.distributed.Work]] = [] | ||
| for peer in peers: | ||
| if process_group is not None: | ||
| peer = torch.distributed.get_global_rank(process_group, peer) | ||
|
|
||
| peer_works = distributed_isend_obj(data_descriptors, dst=peer) | ||
| works.extend(peer_works) | ||
|
|
||
| for tensor in tensors_to_send: | ||
| work = torch.distributed.isend(tensor, dst=peer) | ||
| works.append(work) | ||
|
|
||
| if node.target.blocking: | ||
| for work in works: | ||
| if work is not None: | ||
| work.wait() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, explore the file structure and locate the target file
find . -name "core.py" -path "*puzzletron/sewing_kit*" | head -20Repository: NVIDIA/Model-Optimizer
Length of output: 167
🏁 Script executed:
# Check the file size and read the relevant sections
wc -l modelopt/torch/puzzletron/sewing_kit/core.pyRepository: NVIDIA/Model-Optimizer
Length of output: 114
🏁 Script executed:
# Read the specific lines mentioned in the review (624-677 and 687-718)
sed -n '620,730p' modelopt/torch/puzzletron/sewing_kit/core.py | cat -nRepository: NVIDIA/Model-Optimizer
Length of output: 6346
🏁 Script executed:
# Search for all uses of distributed_isend_obj, distributed_recv_obj in the file
rg "distributed_(isend_obj|recv_obj|isend|irecv)" modelopt/torch/puzzletron/sewing_kit/core.py -B2 -A2Repository: NVIDIA/Model-Optimizer
Length of output: 698
🏁 Script executed:
# Check the function signatures - find where these are imported from
rg "from.*distributed_isend_obj|import.*distributed_isend_obj" modelopt/torch/puzzletron/sewing_kit/ -B2 -A2Repository: NVIDIA/Model-Optimizer
Length of output: 445
🏁 Script executed:
# Check if there's a utils module or similar that defines distributed_isend_obj
find modelopt/ -name "*.py" -type f -exec rg -l "def distributed_isend_obj|def distributed_recv_obj" {} \;Repository: NVIDIA/Model-Optimizer
Length of output: 111
🏁 Script executed:
# Check the function signatures in utils.py
sed -n '1,200p' modelopt/torch/puzzletron/sewing_kit/utils.py | grep -A 15 "def distributed_isend_obj\|def distributed_recv_obj"Repository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Get full definitions of both functions
rg "def distributed_(isend_obj|recv_obj)" modelopt/torch/puzzletron/sewing_kit/utils.py -A 20Repository: NVIDIA/Model-Optimizer
Length of output: 1582
🏁 Script executed:
# Check PyTorch distributed documentation - search for torch.distributed.isend signature
rg "torch.distributed.isend|torch.distributed.irecv" modelopt/torch/puzzletron/sewing_kit/ -B2 -A8Repository: NVIDIA/Model-Optimizer
Length of output: 3244
🏁 Script executed:
# Search for any examples in the codebase that use group parameter with these functions
rg "distributed_isend_obj.*group|torch.distributed.isend.*group|torch.distributed.irecv.*group" modelopt/Repository: NVIDIA/Model-Optimizer
Length of output: 652
🏁 Script executed:
# Search for any other potential P2P call sites in the file that might have similar issues
rg "distributed.isend|distributed.irecv|distributed.send|distributed.recv|distributed_isend|distributed_recv" modelopt/torch/puzzletron/sewing_kit/core.py -nRepository: NVIDIA/Model-Optimizer
Length of output: 483
Pass group=process_group to all distributed P2P calls.
The code extracts process_group and converts local peer ranks to global ranks when process_group is not None (lines 646 and 688 via torch.distributed.get_global_rank()), but the actual distributed_isend_obj(), isend(), distributed_recv_obj(), and irecv() calls at lines 667, 671, 690, and 713 do not receive the group parameter. This causes all communication to use the default process group instead of the specified subgroup, resulting in hangs or communication with the wrong peers.
🛠️ Suggested fix
- peer_works = distributed_isend_obj(data_descriptors, dst=peer)
+ peer_works = distributed_isend_obj(
+ data_descriptors,
+ dst=peer,
+ group=process_group,
+ )
@@
- work = torch.distributed.isend(tensor, dst=peer)
+ work = torch.distributed.isend(
+ tensor,
+ dst=peer,
+ group=process_group,
+ )
@@
- data_descriptors = distributed_recv_obj(src=peer)
+ data_descriptors = distributed_recv_obj(
+ src=peer,
+ group=process_group,
+ )
@@
- work = torch.distributed.irecv(tensor, src=peer)
+ work = torch.distributed.irecv(
+ tensor,
+ src=peer,
+ group=process_group,
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/sewing_kit/core.py` around lines 624 - 677, The P2P
calls are missing the process group argument so they use the default group;
update all distributed send/recv calls to pass group=process_group where
available: add group=process_group to calls to distributed_isend_obj(...) and
distributed_recv_obj(...), and to torch.distributed.isend(...) and
torch.distributed.irecv(...); keep the existing logic that maps local->global
ranks via torch.distributed.get_global_rank(process_group, peer) and only pass
group when process_group is not None so subgroup communication uses the correct
group.
| def _load_subblock_configs_from_subblock_library(master_puzzle_dir: Path) -> list[SubblockConfig]: | ||
| subblocks_df = pd.read_json(master_puzzle_dir / "subblock_library.json") | ||
| subblocks_df["attention_config"] = subblocks_df["attention_config"].apply( | ||
| partial(_dataclass_from_dict, cls=AttentionConfig) | ||
| ) | ||
| subblocks_df["ffn_config"] = subblocks_df["ffn_config"].apply( | ||
| partial(_dataclass_from_dict, cls=FFNConfig) | ||
| ) | ||
| attention_configs = subblocks_df["attention_config"].dropna().drop_duplicates().tolist() | ||
| ffn_configs = subblocks_df["ffn_config"].dropna().drop_duplicates().tolist() | ||
| subblock_configs = attention_configs + ffn_configs | ||
| return subblock_configs |
There was a problem hiding this comment.
Return the same indexed shape from the fallback loader.
This fallback returns bare AttentionConfig/FFNConfig instances, but calculate_subblock_stats() later indexes each item as ["subblock_config"] and ["parent_layer_indices"]. When replacement_library.json is absent, this path fails immediately instead of producing stats.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py` around lines
405 - 416, The fallback loader _load_subblock_configs_from_subblock_library
currently returns raw AttentionConfig/FFNConfig objects but callers
(calculate_subblock_stats) expect list entries with keys/fields
"subblock_config" and "parent_layer_indices" (i.e. SubblockConfig-shaped items);
change the function to construct and return SubblockConfig instances (or dicts
matching SubblockConfig) for each unique attention/ffn config, populating
subblock_config with the AttentionConfig/FFNConfig and setting
parent_layer_indices to a sensible default (e.g. empty list) or the original
parent indices if present in the JSON, so the returned list matches the same
indexed shape used by calculate_subblock_stats. Ensure the function name
_load_subblock_configs_from_subblock_library and the
SubblockConfig/AttentionConfig/FFNConfig symbols are used to locate and
instantiate the correct dataclass.
| # Prepare arguments for parallel processing | ||
| process_layer_partial = partial( | ||
| _process_single_layer, | ||
| pruning_mixin=pruning_mixin, | ||
| descriptor=descriptor, | ||
| parent_state_dict=original_state_dict, | ||
| new_state_dict=new_state_dict, | ||
| original_config=original_config, | ||
| new_config=new_config, | ||
| gqa_init_mode=gqa_init_mode, | ||
| mlp_init_mode=mlp_init_mode, | ||
| mlp_init_config=mlp_init_config, | ||
| linear_init_mode=linear_init_mode, | ||
| ignored_keys=ignored_keys, | ||
| keys=keys, | ||
| is_original_mha=is_original_mha, | ||
| head_size=head_size, | ||
| hidden_size=hidden_size, | ||
| ) |
There was a problem hiding this comment.
Don't share a mutable keys map across worker threads.
create_child_state_dict() passes the same keys dict into every worker, and _init_mlp() mutates it with keys.pop(...). That introduces a real race in the fallback path: workers can remove each other's entries, yielding nondeterministic KeyErrors or copying the wrong leftovers into the final state dict.
Also applies to: 452-455, 694-696
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/tools/bypassed_training/child_init.py` around lines
425 - 443, The issue is that the same mutable keys dict is shared across worker
tasks causing races when _init_mlp mutates it; update each partial so it
receives a worker-local copy of keys (e.g., pass keys.copy() or dict(keys))
instead of the original dict when creating process_layer_partial (and the other
partials at the other sites noted), and ensure
create_child_state_dict/_process_single_layer/_init_mlp keep using their
existing names (create_child_state_dict, _process_single_layer, _init_mlp) but
operate on the copied keys to avoid cross-thread pops and nondeterministic
KeyError or wrong-state copying.
| def _write_file_process_safe( | ||
| content: Any, | ||
| path: Path | str, | ||
| write_fn: Callable[[Any, BinaryIO], None] = _write_text, | ||
| ) -> None: | ||
| """ | ||
| Write a file in a multi-process safe way. | ||
| If another process tries to write the same file using this method, the current process | ||
| "gives up" and assumes that the matter is being taken care of by another process. | ||
|
|
||
| write_fn is a function that receives file contents and a binary file object, | ||
| and writes the content to the file. It can be _write_text (defined above), or torch.save, | ||
| or a similar function (not safetensors.torch.save_file since it expects a path). | ||
| """ | ||
| with open(path, "wb") as f: | ||
| # Try to acquire an exclusive, non-blocking lock | ||
| try: | ||
| fcntl.flock(f, fcntl.LOCK_EX | fcntl.LOCK_NB) | ||
| except BlockingIOError: | ||
| return # Exit immediately if the lock is not acquired | ||
|
|
||
| write_fn(content, f) # Write the content if lock is acquired | ||
| f.flush() # Ensure data is written to disk | ||
|
|
||
| # Release the lock | ||
| fcntl.flock(f, fcntl.LOCK_UN) |
There was a problem hiding this comment.
Acquire the file lock before truncating the index file.
open(path, "wb") truncates the file before flock() runs. If another process already owns the lock, this contender still zeroes the file and then returns, so the writer you intended to protect can lose or corrupt the index.
🛠️ Proposed fix
- with open(path, "wb") as f:
+ with open(path, "ab+") as f:
# Try to acquire an exclusive, non-blocking lock
try:
fcntl.flock(f, fcntl.LOCK_EX | fcntl.LOCK_NB)
except BlockingIOError:
return # Exit immediately if the lock is not acquired
-
+ f.seek(0)
+ f.truncate()
write_fn(content, f) # Write the content if lock is acquired
f.flush() # Ensure data is written to disk
+ os.fsync(f.fileno())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py` around lines 335 -
360, The current _write_file_process_safe opens the file with "wb" which
truncates before acquiring the flock, risking data loss; change the open/lock
sequence so you acquire the exclusive non-blocking lock before
truncating/writing: open the file in a non-truncating mode that still creates
the file if missing (e.g., "a+b"), immediately acquire fcntl.flock(f,
fcntl.LOCK_EX | fcntl.LOCK_NB), return on BlockingIOError, then seek to start
and f.truncate(0) (or otherwise reset) before calling write_fn(content, f),
flushing, and finally releasing the lock in _write_file_process_safe. Ensure you
still support write_fn semantics and file creation behavior.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1121 +/- ##
==========================================
+ Coverage 70.14% 70.17% +0.02%
==========================================
Files 230 230
Lines 26053 26057 +4
==========================================
+ Hits 18276 18285 +9
+ Misses 7777 7772 -5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
1b25b4a to
7e15fdd
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/block_config.py (3)
76-82: Consider deleting frame reference to avoid reference cycles.Per Python documentation, holding references to frame objects can prevent garbage collection. Delete the frame reference when done.
♻️ Proposed fix
def _is_called_from_post_init() -> bool: frame = inspect.currentframe() - while frame: - if frame.f_code.co_name == "__post_init__": - return True - frame = frame.f_back - return False + try: + while frame: + if frame.f_code.co_name == "__post_init__": + return True + frame = frame.f_back + return False + finally: + del frame🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/block_config.py` around lines 76 - 82, The function _is_called_from_post_init currently retains a reference to the inspected frame object which can create reference cycles; update the function to ensure the frame reference is cleared when done (e.g., use a try/finally or explicitly del frame before returning) so the frame object can be garbage-collected; keep the existing traversal logic (checking frame.f_code.co_name == "__post_init__") but ensure you delete the frame variable at the end of the function and also delete any temporary frame references used inside the loop.
89-93: Bareexcept:catches too much.This catches
SystemExit,KeyboardInterrupt, and other exceptions that shouldn't be silently swallowed. Useexcept Exception:to catch only regular exceptions.♻️ Proposed fix
try: _get_dataclass_type(tp) return True - except: + except Exception: return False🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/block_config.py` around lines 89 - 93, The current try/except around the call to _get_dataclass_type uses a bare except which swallows SystemExit/KeyboardInterrupt; change the bare except to catch only regular exceptions (use except Exception:) in the block that tries _get_dataclass_type and returns True/False so only normal errors are caught while allowing critical exceptions to propagate.
96-96: Incorrect return type annotation.
dataclassis a decorator, not a type. The return type should beType[Any]or similar.♻️ Proposed fix
-def _get_dataclass_type(tp: Type) -> dataclass: +def _get_dataclass_type(tp: Type) -> Type[Any]:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/block_config.py` at line 96, The return type annotation on _get_dataclass_type is wrong: `dataclass` is a decorator, not a type; change the signature to return a type object (e.g. use -> Type[Any] or a generic -> Type[T]) and update imports as needed (from typing import Any, Type or TypeVar T) so the function signature becomes something like def _get_dataclass_type(tp: Type) -> Type[Any]:; ensure any callers/types align with the new typing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/block_config.py`:
- Around line 146-155: The __post_init__ validation in class BlockConfig
incorrectly references self.top_k (undefined) instead of
self.num_experts_per_tok; update all occurrences in the __post_init__ method
(the checks that raise ValueError) to use self.num_experts_per_tok for the
numeric comparisons and the error messages, and make the messages consistent
(e.g., "num_experts_per_tok must be positive" and "num_experts_per_tok (...)
cannot be greater than num_local_experts (...)") so no AttributeError occurs at
runtime.
---
Nitpick comments:
In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/block_config.py`:
- Around line 76-82: The function _is_called_from_post_init currently retains a
reference to the inspected frame object which can create reference cycles;
update the function to ensure the frame reference is cleared when done (e.g.,
use a try/finally or explicitly del frame before returning) so the frame object
can be garbage-collected; keep the existing traversal logic (checking
frame.f_code.co_name == "__post_init__") but ensure you delete the frame
variable at the end of the function and also delete any temporary frame
references used inside the loop.
- Around line 89-93: The current try/except around the call to
_get_dataclass_type uses a bare except which swallows
SystemExit/KeyboardInterrupt; change the bare except to catch only regular
exceptions (use except Exception:) in the block that tries _get_dataclass_type
and returns True/False so only normal errors are caught while allowing critical
exceptions to propagate.
- Line 96: The return type annotation on _get_dataclass_type is wrong:
`dataclass` is a decorator, not a type; change the signature to return a type
object (e.g. use -> Type[Any] or a generic -> Type[T]) and update imports as
needed (from typing import Any, Type or TypeVar T) so the function signature
becomes something like def _get_dataclass_type(tp: Type) -> Type[Any]:; ensure
any callers/types align with the new typing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c79afac1-3372-4f26-862d-05cd1e49a5be
📒 Files selected for processing (4)
.github/workflows/example_tests.yml.github/workflows/gpu_tests.yml.pre-commit-config.yamlmodelopt/torch/puzzletron/decilm/deci_lm_hf_code/block_config.py
✅ Files skipped from review due to trivial changes (1)
- .github/workflows/gpu_tests.yml
🚧 Files skipped from review as they are similar to previous changes (2)
- .pre-commit-config.yaml
- .github/workflows/example_tests.yml
| def __post_init__(self): | ||
| # Validate the configuration | ||
| if self.num_local_experts <= 0: | ||
| raise ValueError(f"num_local_experts must be positive, got {self.num_local_experts}") | ||
| if self.num_experts_per_tok <= 0: | ||
| raise ValueError(f"top_k must be positive, got {self.top_k}") | ||
| if self.num_experts_per_tok > self.num_local_experts: | ||
| raise ValueError( | ||
| f"top_k ({self.top_k}) cannot be greater than num_local_experts ({self.num_local_experts})" | ||
| ) |
There was a problem hiding this comment.
Critical bug: self.top_k is undefined; should be self.num_experts_per_tok.
The validation references self.top_k in error messages and comparisons, but the field is named num_experts_per_tok. This will raise AttributeError at runtime whenever these validation branches are hit.
🐛 Proposed fix
def __post_init__(self):
# Validate the configuration
if self.num_local_experts <= 0:
raise ValueError(f"num_local_experts must be positive, got {self.num_local_experts}")
if self.num_experts_per_tok <= 0:
- raise ValueError(f"top_k must be positive, got {self.top_k}")
+ raise ValueError(f"num_experts_per_tok must be positive, got {self.num_experts_per_tok}")
if self.num_experts_per_tok > self.num_local_experts:
raise ValueError(
- f"top_k ({self.top_k}) cannot be greater than num_local_experts ({self.num_local_experts})"
+ f"num_experts_per_tok ({self.num_experts_per_tok}) cannot be greater than num_local_experts ({self.num_local_experts})"
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/decilm/deci_lm_hf_code/block_config.py` around
lines 146 - 155, The __post_init__ validation in class BlockConfig incorrectly
references self.top_k (undefined) instead of self.num_experts_per_tok; update
all occurrences in the __post_init__ method (the checks that raise ValueError)
to use self.num_experts_per_tok for the numeric comparisons and the error
messages, and make the messages consistent (e.g., "num_experts_per_tok must be
positive" and "num_experts_per_tok (...) cannot be greater than
num_local_experts (...)") so no AttributeError occurs at runtime.
…ants Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 13
♻️ Duplicate comments (2)
modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py (1)
404-415:⚠️ Potential issue | 🟠 MajorReturn format mismatch in fallback loader (existing issue).
The
_load_subblock_configs_from_subblock_libraryfunction returns rawAttentionConfig/FFNConfigobjects, butcalculate_subblock_stats(lines 151-152) expects items with["subblock_config"]and["parent_layer_indices"]keys. This will cause a runtime error whenreplacement_library.jsonis absent.🐛 Proposed fix
def _load_subblock_configs_from_subblock_library(master_puzzle_dir: Path) -> list[SubblockConfig]: subblocks_df = pd.read_json(master_puzzle_dir / "subblock_library.json") subblocks_df["attention_config"] = subblocks_df["attention_config"].apply( partial(_dataclass_from_dict, cls=AttentionConfig) ) subblocks_df["ffn_config"] = subblocks_df["ffn_config"].apply( partial(_dataclass_from_dict, cls=FFNConfig) ) attention_configs = subblocks_df["attention_config"].dropna().drop_duplicates().tolist() ffn_configs = subblocks_df["ffn_config"].dropna().drop_duplicates().tolist() - subblock_configs = attention_configs + ffn_configs + subblock_configs = [ + immutabledict({"subblock_config": cfg, "parent_layer_indices": tuple([-1])}) + for cfg in attention_configs + ffn_configs + ] return subblock_configs🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py` around lines 404 - 415, The fallback loader _load_subblock_configs_from_subblock_library currently returns raw AttentionConfig/FFNConfig objects but calculate_subblock_stats expects each item to be a mapping with keys "subblock_config" and "parent_layer_indices"; change _load_subblock_configs_from_subblock_library to wrap each config into a dict like {"subblock_config": <config>, "parent_layer_indices": []} (use an empty list or appropriate default for parent_layer_indices) for both attention_configs and ffn_configs so the combined subblock_configs list matches the shape expected by calculate_subblock_stats.modelopt/torch/puzzletron/mip/run_puzzle.py (1)
396-424:⚠️ Potential issue | 🟠 MajorCLI validation crashes on
argparse.Namespace(existing issue).The checks like
"subblock_stats_args" not in args(line 408) useinoperator onargparse.Namespace, which is not iterable and will raiseTypeError. The past review comment already flagged this issue.🐛 Proposed fix
def _assert_valid_config(args, puzzle_profile): required_args = ( "subblock_stats_path", "objective", "output_path", ) - missing_args = [arg for arg in required_args if arg not in args or getattr(args, arg) is None] + missing_args = [arg for arg in required_args if getattr(args, arg, None) is None] if missing_args: mprint(f"error: The following arguments are required: {', '.join(missing_args)}") sys.exit(1) # Make sure we have specified subblock_stats_args - if "subblock_stats_args" not in args and "subblock_stats_args" not in puzzle_profile: + if getattr(args, "subblock_stats_args", None) is None and "subblock_stats_args" not in puzzle_profile: mprint( "error: Must specify `subblock_stats_arrs` in either puzzle_profile or as a commandline arg." ) sys.exit(1) # Make sure we have specified constraints if ( - "mip_constraints" not in args - and "human_constraints" not in args + getattr(args, "mip_constraints", None) is None + and getattr(args, "human_constraints", None) is None and "mip_constraints" not in puzzle_profile and "human_constraints" not in puzzle_profile ):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/mip/run_puzzle.py` around lines 396 - 424, In _assert_valid_config, the checks like `"subblock_stats_args" not in args` and similar use the in-operator on argparse.Namespace (causing TypeError); change those presence checks to use hasattr(args, "subblock_stats_args") or getattr(args, "subblock_stats_args") is not None for each CLI arg check, while keeping the existing membership checks against puzzle_profile (which is a dict); update the two conditional blocks that validate subblock_stats_args and the constraints (mip_constraints, human_constraints) to use hasattr/getattr on args and keep the puzzle_profile checks as-is so validation works for both Namespace and profile.
🧹 Nitpick comments (9)
modelopt/torch/puzzletron/anymodel/models/qwen3/qwen3_converter.py (1)
33-33: Return type annotation mismatch with actual return value.The signature declares
List[BlockConfig]but the method returnsList[dict]after calling.to_dict(). While context snippet 4 shows the Llama converter follows the same pattern (so this is established behavior), it creates a disconnect with the base class contract inconverter.py:193-210.Consider updating the return type to
List[Dict[str, Any]]to accurately reflect what is returned, or coordinate a broader fix across all converters.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/anymodel/models/qwen3/qwen3_converter.py` at line 33, The function create_block_configs_from_main_config currently declares a return type of List[BlockConfig] but actually returns a list of dicts (via .to_dict()); update the signature to return List[Dict[str, Any]] and add the necessary typing imports (Dict, Any) so the annotation matches the actual return value, and scan related converters (e.g., the Llama converter and base converter methods) to ensure callers expect dicts or adjust them consistently.modelopt/torch/puzzletron/anymodel/models/qwen3/qwen3_model_descriptor.py (1)
64-65: Add return type annotation for consistency with base class.The base class in
model_descriptor.py:42-54declaresblock_config_to_layer_overrideswith return typeDict[str, Any]. Adding the annotation here would improve type safety and IDE support.♻️ Proposed fix
+from typing import Any, Dict, List -from typing import Dict, List ... `@staticmethod` - def block_config_to_layer_overrides(block_config: BlockConfig): + def block_config_to_layer_overrides(block_config: BlockConfig) -> Dict[str, Any]: return {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/anymodel/models/qwen3/qwen3_model_descriptor.py` around lines 64 - 65, The method block_config_to_layer_overrides in qwen3_model_descriptor.py is missing the return type annotation declared on the base class; update its signature to include -> Dict[str, Any] (import Dict and Any from typing if not already present) so it matches the base class declaration in model_descriptor.py (see block_config_to_layer_overrides and BlockConfig) and restores proper type safety and IDE support.modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.py (1)
15-15: Narrow the file-wide mypy suppression.Line 15 (
# mypy: ignore-errors) disables type checking for the entire file. Multiple static methods lack return type annotations (e.g.,decoder_layer_cls(),input_embedding_name(),layer_block_name(index: int), etc.). Add return type annotations to these methods and use targeted# type: ignorecomments only where necessary (e.g., for untyped third-party imports from transformers).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.py` at line 15, The file-wide "# mypy: ignore-errors" must be removed and replaced by precise type annotations on the static methods (e.g., add return types to decoder_layer_cls, input_embedding_name, layer_block_name(index: int), and other static methods) and only use targeted "# type: ignore" comments for specific untyped third-party imports; for example declare decoder_layer_cls() -> Type[Any] or the proper transformer layer class, input_embedding_name() -> str, layer_block_name(index: int) -> str, and any other methods with explicit return types so mypy can validate the file while leaving isolated ignores on the import lines that actually require them.modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_converter.py (1)
74-76: Consider usingmprintinstead ofOther modules in this PR use
mprintfrommodelopt.torch.puzzletron.tools.loggerfor consistent logging. Usingprint()directly may not integrate well with logging configurations.♻️ Suggested change
+from modelopt.torch.puzzletron.tools.logger import mprint + ... - print( + mprint( f"Created {len(block_configs)} block configs for Qwen3-VL MoE (decoder_sparse_step={decoder_sparse_step})" )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_converter.py` around lines 74 - 76, Replace the direct print call with the project logging helper: import mprint from modelopt.torch.puzzletron.tools.logger and change the print(...) in qwen3_vl_converter.py to mprint(f"Created {len(block_configs)} block configs for Qwen3-VL MoE (decoder_sparse_step={decoder_sparse_step})") so the message uses the consistent logger; ensure the import is added near other imports and no other behavior changes to block_configs or decoder_sparse_step.modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py (1)
27-27: Setting environment variable at module import time has global side effects.Setting
os.environ["PYTORCH_CUDA_ALLOC_CONF"]at the top level of the module means this side effect occurs whenever any code imports this module, even if that code doesn't use the functions here. Consider moving this inside the functions that need it, or making it conditional.♻️ Suggested change
-os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True" +def _set_cuda_alloc_config(): + """Set CUDA allocator config for memory-intensive operations.""" + if "PYTORCH_CUDA_ALLOC_CONF" not in os.environ: + os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"Then call
_set_cuda_alloc_config()at the start ofcalculate_subblock_statsorcalculate_subblock_stats_for_puzzle_dir.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py` at line 27, The module currently sets os.environ["PYTORCH_CUDA_ALLOC_CONF"] at import time which creates a global side effect; instead create a small helper (e.g. _set_cuda_alloc_config()) that sets that env var only when needed, remove the top-level assignment, and call _set_cuda_alloc_config() at the start of calculate_subblock_stats and calculate_subblock_stats_for_puzzle_dir so the CUDA config is applied only when those functions run (ensure the helper is idempotent to avoid repeated sets).modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py (2)
165-166: Remove duplicate variable assignment.
actual_io_workersis computed on line 165 and then recomputed identically on line 181. The second assignment is redundant.♻️ Suggested fix
# Profile _save_checkpoint with automatic I/O worker calculation mprint("Starting _save_checkpoint...") - actual_io_workers = max_workers if max_workers else "auto" - mprint(f"I/O Settings: max_workers={actual_io_workers}") + mprint(f"I/O Settings: max_workers={max_workers if max_workers else 'auto'}") start_time = time.time() ... # Print profiling summary with actual worker counts used total_core_time = create_child_state_dict_time + save_checkpoint_time actual_layer_workers = max_layer_workers if max_layer_workers else "auto" - actual_io_workers = max_workers if max_workers else "auto" + actual_io_workers = max_workers if max_workers else "auto" # keep this oneAlso applies to: 180-181
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py` around lines 165 - 166, Remove the duplicated assignment of actual_io_workers and its associated mprint call in init_child_from_parent.py: keep the first computation (actual_io_workers = max_workers if max_workers else "auto") and the initial mprint("I/O Settings:..."), and delete the later identical reassignment and print (the repeated lines around where actual_io_workers is set again near line 181). This removes the redundant code while leaving any subsequent uses of actual_io_workers and the mprint calls (function mprint) intact.
90-91: Add error handling for JSON parsing.If
model_config_overrides_dictis a malformed JSON string,json.loads()will raise aJSONDecodeError. Consider wrapping this in a try-except with a more informative error message.🛡️ Suggested improvement
# Parse JSON if string if isinstance(model_config_overrides_dict, str): - model_config_overrides_dict = json.loads(model_config_overrides_dict) + try: + model_config_overrides_dict = json.loads(model_config_overrides_dict) + except json.JSONDecodeError as e: + raise ValueError(f"Invalid JSON in model_config_overrides_dict: {e}") from e🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py` around lines 90 - 91, Wrap the json.loads call that parses model_config_overrides_dict in a try-except that catches json.JSONDecodeError, so malformed JSON yields a clear, actionable error; specifically, around the block that checks isinstance(model_config_overrides_dict, str) and calls json.loads, catch JSONDecodeError and raise a ValueError (or log and raise) with a message that includes that model_config_overrides_dict failed to parse and include the offending string (or truncated preview) and the original exception message to aid debugging.modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py (2)
119-126: Consider adding defensive checks for missing attributes.
_block_no_op_post_initaccessesdecoder_layer.config.block_configs[decoder_layer.layer_idx]directly. Ifdecoder_layerlacksconfig,block_configs, orlayer_idxattributes (e.g., when called on a DummyBlock during initialization), this will raise anAttributeError.🛡️ Suggested defensive check
`@staticmethod` def _block_no_op_post_init(decoder_layer): """ Due to the subblock structure of NemotronH always one of the subblock is set to no-op, for a real no-op both attention & ffn no-op should be set to True. """ + if not hasattr(decoder_layer, 'config') or not hasattr(decoder_layer, 'layer_idx'): + return block_config = decoder_layer.config.block_configs[decoder_layer.layer_idx] if block_config.ffn.no_op and block_config.attention.no_op: decoder_layer.norm = Same() decoder_layer.mixer = MatchingZeros()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py` around lines 119 - 126, The _block_no_op_post_init function assumes decoder_layer has config, block_configs, and layer_idx; add defensive checks at the start of _block_no_op_post_init to avoid AttributeError (e.g., if called on a DummyBlock): verify hasattr(decoder_layer, "config"), hasattr(decoder_layer.config, "block_configs"), and hasattr(decoder_layer, "layer_idx") (or use a try/except AttributeError), ensure decoder_layer.layer_idx is an int within range of decoder_layer.config.block_configs, and return early if any check fails; then proceed to access block_config and apply the existing ffn/attention no_op logic on decoder_layer.norm and decoder_layer.mixer.
40-52: Dynamic module discovery may be slow for large module trees.The
get_dynamic_modulesfunction walks all packages undertransformers_modulesand imports each module to find matching classes. This could be slow if the module tree is large. Consider caching the result or limiting the search scope if performance becomes an issue.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py` around lines 40 - 52, get_dynamic_modules currently imports every module under transformers_modules which can be slow; to fix, add caching and an optional scope parameter: decorate get_dynamic_modules with functools.lru_cache (or maintain a module-level cache keyed by module_cls_str and optional scope), and add an optional argument (e.g., package_whitelist: Optional[Iterable[str]] or max_results/first_match: int) so the caller can limit search to specific subpackages or stop after N matches; update the function body to skip packages not in the whitelist and to short-circuit when max_results is reached, keeping references to get_dynamic_modules, transformers_modules, matches, and the import/package iteration logic so changes are easy to locate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In
`@modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py`:
- Around line 90-93: The MoE override assignments in the block-construction path
are writing None into override_kwargs when a partial BlockConfig.ffn.moe is
provided; update the logic in the block config handling (the code that builds
override_kwargs for MoE) to check each MoE field individually before setting
override_kwargs["moe_intermediate_size"] and override_kwargs["n_routed_experts"]
(i.e., only assign block_config.ffn.moe.expert_intermediate_dim and
block_config.ffn.moe.num_local_experts if they are not None), so that
unspecified MoE values do not overwrite model defaults in the NemotronH_v2 model
descriptor.
- Around line 168-174: The loop that filters singleton groups (weight_groups ->
valid_weight_groups) is too broad: it currently skips any single entry ending
with "norm.weight" and containing "layers", which can accidentally drop valid
tensors like "backbone.layers.<i>.mixer.norm.weight". Change the condition in
that loop (variables: weight_groups, valid_weight_groups, only_name) to only
skip when only_name exactly matches the root-layer norm pattern, e.g. use a
regex or precise string check to match "backbone.layers.<number>.norm.weight"
(backbone\.layers\.\d+\.norm\.weight) and only continue in that case. Ensure
other single-entry names that end with "norm.weight" but include additional
subblocks (like ".mixer.") are preserved.
- Around line 39-51: The dynamic discovery currently imports every module in the
shared transformers_modules cache; change get_dynamic_modules to first search
already-imported modules in sys.modules whose names start with
transformers_modules.__name__ + "." and collect matching classes, and only if
none found (or if a specific checkpoint package name is provided) limit
pkgutil.walk_packages to that specific checkpoint subpackage instead of walking
transformers_modules.__path__ entirely; update uses of pkgutil.walk_packages and
importlib.import_module in get_dynamic_modules to restrict imports to that
narrower package scope (reference symbols: get_dynamic_modules, module_cls_str,
transformers_modules, pkgutil.walk_packages, importlib.import_module).
In `@modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.py`:
- Around line 24-28: The module currently does eager, top-level imports of
Qwen2DecoderLayer, Qwen2ForCausalLM, and Qwen2RotaryEmbedding which breaks when
transformers is absent; change to lazy imports by removing those module-level
imports and instead: add a TYPE_CHECKING guard to import those types for type
hints only, and import the actual classes inside the functions/methods that
instantiate or return them (e.g., wherever Qwen2ForCausalLM, Qwen2DecoderLayer,
or Qwen2RotaryEmbedding are used in this descriptor), or use the repository's
import_plugin() pattern to load transformers at call time so the optional
dependency is gated at runtime. Ensure all usage sites reference the locally
imported symbols and keep type hints under TYPE_CHECKING.
In
`@modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_model_descriptor.py`:
- Around line 22-26: The module-level imports of Qwen3VLMoeTextDecoderLayer,
Qwen3VLMoeTextRotaryEmbedding, and Qwen3VLMoeVisionRotaryEmbedding will raise
ModuleNotFoundError on transformers 4.56.x; either bump the transformers
requirement to >=4.57.0 in pyproject.toml, or change the top-level import in
qwen3_vl_model_descriptor.py to a guarded lazy import: wrap the current import
block for
Qwen3VLMoeTextDecoderLayer/Qwen3VLMoeTextRotaryEmbedding/Qwen3VLMoeVisionRotaryEmbedding
in try/except ImportError and fall back to importing alternatives or deferring
import inside the functions/classes that use them so the module can load under
older transformers versions.
In `@modelopt/torch/puzzletron/anymodel/models/qwen3/qwen3_converter.py`:
- Around line 36-42: The block_configs list currently uses [block_config] *
num_hidden_layers which creates repeated references to the same BlockConfig
dict; change this to produce independent dicts per layer by constructing
block_configs with a comprehension or loop that calls BlockConfig(...).to_dict()
(or deep-copies) for each index so each entry is its own object—modify the code
around the BlockConfig/AttentionConfig/FFNConfig creation and the block_configs
variable in qwen3_converter.py to use a list comprehension like creating a new
BlockConfig per iteration rather than repeating the same dict reference.
In `@modelopt/torch/puzzletron/export/mbridge/export_mbridge_to_hf.py`:
- Line 20: Remove the module-level "from megatron.bridge import AutoBridge" and
instead call import_plugin("megatron.bridge") inside the export routine (e.g.,
export_mbridge_to_hf) to lazily load the optional Megatron integration; assign
the returned module or object to a local name and use that (or its AutoBridge
attribute) where currently AutoBridge is referenced so imports don't fail when
Megatron is absent. Ensure you import and use the existing import_plugin()
helper and update any code that referred to the top-level AutoBridge symbol to
use the locally loaded module/attribute.
In `@modelopt/torch/puzzletron/mip/run_puzzle.py`:
- Around line 158-160: Assertion message references 'block_size' while the code
checks for 'batch_size' and 'generation_seq_len'; update the assertion message
in the assert statement that checks all(key in subblock_stats_args for key in
("batch_size", "generation_seq_len")) to mention 'batch_size' instead of
'block_size' (e.g., "Can't realize human constraints without 'batch_size' and
'generation_seq_len' in subblock_stats_args.") so the text matches the checked
keys.
In `@modelopt/torch/puzzletron/sewing_kit/utils.py`:
- Line 15: Remove the file-wide directive "# mypy: ignore-errors" at the top of
the sewing_kit.utils module; instead run mypy to locate the specific type
errors, add precise type annotations to public functions/classes in this module,
and apply narrowly-scoped ignores (e.g., "# type: ignore[assignment]" or "#
type: ignore[arg-type]") only on the specific Torch internals or import lines
that mypy cannot model; ensure exported symbols have explicit return/argument
types so the module is type-checkable without disabling mypy globally.
- Around line 94-126: Fix three bugs in ActivityContext: in __init__/__call__
stop using insert(self.head_index) for normal order — for reversed=True insert
at index 0, otherwise use list.append to push to the end so nested contexts
unwind LIFO; in __call__ only pop when this call actually pushed an item (set a
local flag pushed=False, set pushed=True after insert/append, and in finally
only pop if pushed using pop(0) for reversed or pop() for normal); and correct
the is_active check elsewhere to call the method (use self.is_active() not
self.is_active) so get_active() can return None for empty stacks. Ensure
references changed affect __init__, __call__, is_active, and get_active.
- Around line 357-373: Update _get_device_for_distributed to consistently return
torch.device objects (change the return type annotation from -> str to ->
torch.device); when not initialized return torch.device("cpu"), and when backend
is "nccl" return torch.device("cuda", torch.cuda.current_device()) (or
torch.device("cuda", index) using torch.cuda.current_device()) so both branches
return torch.device instances and usage with Tensor.to() is consistent.
In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py`:
- Around line 131-140: The call to calc_runtime_ms_for_subblocks inside the
block guarded by is_calc_runtime is unreachable because earlier code raises
NotImplementedError for is_calc_runtime==True; remove the dead call to
calc_runtime_ms_for_subblocks (and any related vars runtime_by_subblock_dict,
non_block_runtime_ms) or mark it explicitly as a TODO/placeholder with a clear
comment linking calc_subblock_stats and is_calc_runtime so future implementers
know to implement/ import calc_runtime_ms_for_subblocks; ensure no other
references to runtime_by_subblock_dict or non_block_runtime_ms remain in
calc_subblock_stats.
- Around line 317-320: Replace the incorrect use of the Warning(...) constructor
with warnings.warn(...) so the message is actually emitted (e.g.,
warnings.warn(f"MOE stats file {moe_stats_file} does not exist, can't calculate
num active params", UserWarning, stacklevel=2)); ensure the warnings module is
imported at top of the file and remove the unused Warning() call or exception
object assignment; locate the call by searching for the Warning(...) invocation
or the moe_stats_file variable in calc_subblock_stats.py to apply the change.
---
Duplicate comments:
In `@modelopt/torch/puzzletron/mip/run_puzzle.py`:
- Around line 396-424: In _assert_valid_config, the checks like
`"subblock_stats_args" not in args` and similar use the in-operator on
argparse.Namespace (causing TypeError); change those presence checks to use
hasattr(args, "subblock_stats_args") or getattr(args, "subblock_stats_args") is
not None for each CLI arg check, while keeping the existing membership checks
against puzzle_profile (which is a dict); update the two conditional blocks that
validate subblock_stats_args and the constraints (mip_constraints,
human_constraints) to use hasattr/getattr on args and keep the puzzle_profile
checks as-is so validation works for both Namespace and profile.
In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py`:
- Around line 404-415: The fallback loader
_load_subblock_configs_from_subblock_library currently returns raw
AttentionConfig/FFNConfig objects but calculate_subblock_stats expects each item
to be a mapping with keys "subblock_config" and "parent_layer_indices"; change
_load_subblock_configs_from_subblock_library to wrap each config into a dict
like {"subblock_config": <config>, "parent_layer_indices": []} (use an empty
list or appropriate default for parent_layer_indices) for both attention_configs
and ffn_configs so the combined subblock_configs list matches the shape expected
by calculate_subblock_stats.
---
Nitpick comments:
In
`@modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py`:
- Around line 119-126: The _block_no_op_post_init function assumes decoder_layer
has config, block_configs, and layer_idx; add defensive checks at the start of
_block_no_op_post_init to avoid AttributeError (e.g., if called on a
DummyBlock): verify hasattr(decoder_layer, "config"),
hasattr(decoder_layer.config, "block_configs"), and hasattr(decoder_layer,
"layer_idx") (or use a try/except AttributeError), ensure
decoder_layer.layer_idx is an int within range of
decoder_layer.config.block_configs, and return early if any check fails; then
proceed to access block_config and apply the existing ffn/attention no_op logic
on decoder_layer.norm and decoder_layer.mixer.
- Around line 40-52: get_dynamic_modules currently imports every module under
transformers_modules which can be slow; to fix, add caching and an optional
scope parameter: decorate get_dynamic_modules with functools.lru_cache (or
maintain a module-level cache keyed by module_cls_str and optional scope), and
add an optional argument (e.g., package_whitelist: Optional[Iterable[str]] or
max_results/first_match: int) so the caller can limit search to specific
subpackages or stop after N matches; update the function body to skip packages
not in the whitelist and to short-circuit when max_results is reached, keeping
references to get_dynamic_modules, transformers_modules, matches, and the
import/package iteration logic so changes are easy to locate.
In `@modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.py`:
- Line 15: The file-wide "# mypy: ignore-errors" must be removed and replaced by
precise type annotations on the static methods (e.g., add return types to
decoder_layer_cls, input_embedding_name, layer_block_name(index: int), and other
static methods) and only use targeted "# type: ignore" comments for specific
untyped third-party imports; for example declare decoder_layer_cls() ->
Type[Any] or the proper transformer layer class, input_embedding_name() -> str,
layer_block_name(index: int) -> str, and any other methods with explicit return
types so mypy can validate the file while leaving isolated ignores on the import
lines that actually require them.
In `@modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_converter.py`:
- Around line 74-76: Replace the direct print call with the project logging
helper: import mprint from modelopt.torch.puzzletron.tools.logger and change the
print(...) in qwen3_vl_converter.py to mprint(f"Created {len(block_configs)}
block configs for Qwen3-VL MoE (decoder_sparse_step={decoder_sparse_step})") so
the message uses the consistent logger; ensure the import is added near other
imports and no other behavior changes to block_configs or decoder_sparse_step.
In `@modelopt/torch/puzzletron/anymodel/models/qwen3/qwen3_converter.py`:
- Line 33: The function create_block_configs_from_main_config currently declares
a return type of List[BlockConfig] but actually returns a list of dicts (via
.to_dict()); update the signature to return List[Dict[str, Any]] and add the
necessary typing imports (Dict, Any) so the annotation matches the actual return
value, and scan related converters (e.g., the Llama converter and base converter
methods) to ensure callers expect dicts or adjust them consistently.
In `@modelopt/torch/puzzletron/anymodel/models/qwen3/qwen3_model_descriptor.py`:
- Around line 64-65: The method block_config_to_layer_overrides in
qwen3_model_descriptor.py is missing the return type annotation declared on the
base class; update its signature to include -> Dict[str, Any] (import Dict and
Any from typing if not already present) so it matches the base class declaration
in model_descriptor.py (see block_config_to_layer_overrides and BlockConfig) and
restores proper type safety and IDE support.
In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py`:
- Line 27: The module currently sets os.environ["PYTORCH_CUDA_ALLOC_CONF"] at
import time which creates a global side effect; instead create a small helper
(e.g. _set_cuda_alloc_config()) that sets that env var only when needed, remove
the top-level assignment, and call _set_cuda_alloc_config() at the start of
calculate_subblock_stats and calculate_subblock_stats_for_puzzle_dir so the CUDA
config is applied only when those functions run (ensure the helper is idempotent
to avoid repeated sets).
In `@modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py`:
- Around line 165-166: Remove the duplicated assignment of actual_io_workers and
its associated mprint call in init_child_from_parent.py: keep the first
computation (actual_io_workers = max_workers if max_workers else "auto") and the
initial mprint("I/O Settings:..."), and delete the later identical reassignment
and print (the repeated lines around where actual_io_workers is set again near
line 181). This removes the redundant code while leaving any subsequent uses of
actual_io_workers and the mprint calls (function mprint) intact.
- Around line 90-91: Wrap the json.loads call that parses
model_config_overrides_dict in a try-except that catches json.JSONDecodeError,
so malformed JSON yields a clear, actionable error; specifically, around the
block that checks isinstance(model_config_overrides_dict, str) and calls
json.loads, catch JSONDecodeError and raise a ValueError (or log and raise) with
a message that includes that model_config_overrides_dict failed to parse and
include the offending string (or truncated preview) and the original exception
message to aid debugging.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: dad8af7f-6d6b-49c2-bbc4-3bd3b55b551b
📒 Files selected for processing (23)
.pre-commit-config.yamlexamples/puzzletron/configs/qwen3-8b_pruneffn_memory/pruning/ffn_pruning.yamlmodelopt/torch/puzzletron/anymodel/models/__init__.pymodelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/qwen3/__init__.pymodelopt/torch/puzzletron/anymodel/models/qwen3/qwen3_converter.pymodelopt/torch/puzzletron/anymodel/models/qwen3/qwen3_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/qwen3_vl/__init__.pymodelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_converter.pymodelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_model_descriptor.pymodelopt/torch/puzzletron/export/mbridge/export_mbridge_to_hf.pymodelopt/torch/puzzletron/mip/run_puzzle.pymodelopt/torch/puzzletron/mip/utils.pymodelopt/torch/puzzletron/sewing_kit/core.pymodelopt/torch/puzzletron/sewing_kit/utils.pymodelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.pymodelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.pypyproject.tomltests/examples/puzzletron/mbridge_distillation/test_distill_hf.pytests/gpu/torch/puzzletron/resources/configs/Qwen/Qwen3-8B/pruning/ffn_pruning.yamltests/gpu/torch/puzzletron/resources/configs/Qwen/Qwen3-VL-30B-A3B-Instruct/pruning/expert_pruning.yaml
✅ Files skipped from review due to trivial changes (5)
- modelopt/torch/puzzletron/anymodel/models/qwen3_vl/init.py
- modelopt/torch/puzzletron/anymodel/models/qwen3/init.py
- examples/puzzletron/configs/qwen3-8b_pruneffn_memory/pruning/ffn_pruning.yaml
- modelopt/torch/puzzletron/mip/utils.py
- modelopt/torch/puzzletron/anymodel/models/init.py
🚧 Files skipped from review as they are similar to previous changes (2)
- .pre-commit-config.yaml
- modelopt/torch/puzzletron/sewing_kit/core.py
| def get_dynamic_modules(module_cls_str: str) -> List[Type[nn.Module]]: | ||
| import transformers_modules | ||
|
|
||
| matches = [] | ||
| for finder, modname, ispkg in pkgutil.walk_packages( | ||
| transformers_modules.__path__, transformers_modules.__name__ + "." | ||
| ): | ||
| module = importlib.import_module(modname) | ||
| for _, obj in inspect.getmembers(module, inspect.isclass): | ||
| if obj.__name__ == module_cls_str: | ||
| matches.append(obj) | ||
|
|
||
| return matches |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In Hugging Face Transformers, how is the transformers_modulesnamespace fortrust_remote_code models populated? Is it shared across cached repos/revisions, and does importing every discovered submodule execute top-level code from unrelated cached models?
💡 Result:
transformers_modules is not a “registry” that Transformers eagerly fills; it’s a filesystem-backed Python package tree rooted at HF_MODULES_CACHE (by default under your HF cache), and it’s populated lazily when you load a trust_remote_code=True model.
How it gets populated
When Transformers needs a class from remote code, it calls get_cached_module_file(...), which:
- Ensures
HF_MODULES_CACHEexists, contains a top-level__init__.py, and is added tosys.path(init_hf_modules). (raw.githubusercontent.com) - Creates a package path
HF_MODULES_CACHE/transformers_modules/<sanitized_repo_id>/...with__init__.pyfiles (create_dynamic_module). (raw.githubusercontent.com) - Copies (or downloads+copies) the requested module file (and any relative-imported
.pyfiles) into that package tree. (raw.githubusercontent.com) - Imports the module by file path and executes it via
module_spec.loader.exec_module(module)when the class is needed (get_class_in_module). (raw.githubusercontent.com)
Transformers explicitly warns that calling this mechanism “will execute the code in the module file”. (raw.githubusercontent.com)
Is the namespace shared across cached repos / revisions?
Yes, in the sense that there is one shared root package (transformers_modules) under HF_MODULES_CACHE that accumulates subpackages for whatever you’ve loaded on that machine/env. (raw.githubusercontent.com)
But remote-code revisions are not merged: for non-local repos, Transformers places code under an additional commit-hash subfolder, e.g.:
.../transformers_modules/<repo_id>/<commit_hash>/module.py (raw.githubusercontent.com)
So different revisions get distinct import paths.
Does “importing every discovered submodule” execute unrelated cached models’ top-level code?
- Importing a module executes its top-level code (and
get_class_in_moduleusesexec_module, which does exactly that). (raw.githubusercontent.com) - Transformers itself does not automatically import all cached
transformers_modules.*modules; it imports only the specific module needed for the class you’re loading. (raw.githubusercontent.com) - However, if you enumerate the
transformers_modulespackage (e.g., viapkgutil.walk_packages) and thenimportlib.import_module(...)every discovered module, then yes: you would end up executing top-level code from unrelated cached remote-code modules—because you explicitly imported them. (raw.githubusercontent.com)
Citations:
- 1: https://raw.githubusercontent.com/huggingface/transformers/main/src/transformers/dynamic_module_utils.py
- 2: https://raw.githubusercontent.com/huggingface/transformers/main/src/transformers/dynamic_module_utils.py
- 3: https://raw.githubusercontent.com/huggingface/transformers/main/src/transformers/dynamic_module_utils.py
- 4: https://raw.githubusercontent.com/huggingface/transformers/main/src/transformers/dynamic_module_utils.py
- 5: https://raw.githubusercontent.com/huggingface/transformers/main/src/transformers/dynamic_module_utils.py
- 6: https://raw.githubusercontent.com/huggingface/transformers/main/src/transformers/dynamic_module_utils.py
- 7: https://raw.githubusercontent.com/huggingface/transformers/main/src/transformers/dynamic_module_utils.py
- 8: https://raw.githubusercontent.com/huggingface/transformers/main/src/transformers/dynamic_module_utils.py
- 9: https://raw.githubusercontent.com/huggingface/transformers/main/src/transformers/dynamic_module_utils.py
- 10: https://raw.githubusercontent.com/huggingface/transformers/main/src/transformers/dynamic_module_utils.py
Scope dynamic-module discovery to the selected checkpoint.
Lines 39–51 walk the entire transformers_modules namespace and import every discovered module. Since transformers_modules is a shared filesystem cache that accumulates remote-code packages from all previously loaded trust_remote_code=True models, this code unnecessarily executes top-level code from unrelated cached models. The function should discover modules only within the package cached for the specific checkpoint being converted, or from modules already loaded in the current session.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py`
around lines 39 - 51, The dynamic discovery currently imports every module in
the shared transformers_modules cache; change get_dynamic_modules to first
search already-imported modules in sys.modules whose names start with
transformers_modules.__name__ + "." and collect matching classes, and only if
none found (or if a specific checkpoint package name is provided) limit
pkgutil.walk_packages to that specific checkpoint subpackage instead of walking
transformers_modules.__path__ entirely; update uses of pkgutil.walk_packages and
importlib.import_module in get_dynamic_modules to restrict imports to that
narrower package scope (reference symbols: get_dynamic_modules, module_cls_str,
transformers_modules, pkgutil.walk_packages, importlib.import_module).
| if block_config.ffn is not None and block_config.ffn.moe is not None: | ||
| override_kwargs["moe_intermediate_size"] = block_config.ffn.moe.expert_intermediate_dim | ||
| override_kwargs["n_routed_experts"] = block_config.ffn.moe.num_local_experts | ||
|
|
There was a problem hiding this comment.
Don't write None into MoE constructor overrides.
Unlike the dense FFN and attention branches, Lines 91-92 forward the MoE fields without checking for None. A partial BlockConfig.ffn.moe will overwrite model defaults with None, which can break block construction and sweep configs that change only one MoE dimension.
🐛 Proposed fix
- if block_config.ffn is not None and block_config.ffn.moe is not None:
- override_kwargs["moe_intermediate_size"] = block_config.ffn.moe.expert_intermediate_dim
- override_kwargs["n_routed_experts"] = block_config.ffn.moe.num_local_experts
+ if block_config.ffn is not None and block_config.ffn.moe is not None:
+ if block_config.ffn.moe.expert_intermediate_dim is not None:
+ override_kwargs["moe_intermediate_size"] = (
+ block_config.ffn.moe.expert_intermediate_dim
+ )
+ if block_config.ffn.moe.num_local_experts is not None:
+ override_kwargs["n_routed_experts"] = block_config.ffn.moe.num_local_experts🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py`
around lines 90 - 93, The MoE override assignments in the block-construction
path are writing None into override_kwargs when a partial BlockConfig.ffn.moe is
provided; update the logic in the block config handling (the code that builds
override_kwargs for MoE) to check each MoE field individually before setting
override_kwargs["moe_intermediate_size"] and override_kwargs["n_routed_experts"]
(i.e., only assign block_config.ffn.moe.expert_intermediate_dim and
block_config.ffn.moe.num_local_experts if they are not None), so that
unspecified MoE values do not overwrite model defaults in the NemotronH_v2 model
descriptor.
| valid_weight_groups = {} | ||
| for group, names in weight_groups.items(): | ||
| if len(names) == 1: | ||
| only_name = names[0] | ||
| if only_name.endswith("norm.weight") and "layers" in only_name: | ||
| # Skip and don't append this group to valid_weight_groups | ||
| continue |
There was a problem hiding this comment.
Narrow the singleton-group filter to the duplicated root norm only.
Line 172 drops any one-element group ending in norm.weight, which also matches unique tensors such as backbone.layers.<i>.mixer.norm.weight. If a subblock legitimately has only that weight, get_weight_groups() will silently omit it from the checkpoint map. Match only backbone.layers.<i>.norm.weight here.
🐛 Proposed fix
- if only_name.endswith("norm.weight") and "layers" in only_name:
+ if re.fullmatch(r"backbone\.layers\.\d+\.norm\.weight", only_name):
# Skip and don't append this group to valid_weight_groups
continue🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py`
around lines 168 - 174, The loop that filters singleton groups (weight_groups ->
valid_weight_groups) is too broad: it currently skips any single entry ending
with "norm.weight" and containing "layers", which can accidentally drop valid
tensors like "backbone.layers.<i>.mixer.norm.weight". Change the condition in
that loop (variables: weight_groups, valid_weight_groups, only_name) to only
skip when only_name exactly matches the root-layer norm pattern, e.g. use a
regex or precise string check to match "backbone.layers.<number>.norm.weight"
(backbone\.layers\.\d+\.norm\.weight) and only continue in that case. Ensure
other single-entry names that end with "norm.weight" but include additional
subblocks (like ".mixer.") are preserved.
| from transformers.models.qwen2.modeling_qwen2 import ( | ||
| Qwen2DecoderLayer, | ||
| Qwen2ForCausalLM, | ||
| Qwen2RotaryEmbedding, | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Verify module-level HF imports in descriptor modules
rg -n --type=py '^\s*(from|import)\s+transformers' modelopt/torch/puzzletron/anymodel/models -C2
# Locate existing import_plugin usage to align this file with repo pattern
rg -n --type=py '\bimport_plugin\(' modelopt -C2Repository: NVIDIA/Model-Optimizer
Length of output: 24064
🏁 Script executed:
# Read the full qwen2_model_descriptor.py to see how imports are used
head -100 modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.pyRepository: NVIDIA/Model-Optimizer
Length of output: 3511
🏁 Script executed:
# Check if Qwen2ForCausalLM, Qwen2DecoderLayer are used in type hints at module level
rg -A 5 'Qwen2ForCausalLM|Qwen2DecoderLayer|Qwen2RotaryEmbedding' modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.pyRepository: NVIDIA/Model-Optimizer
Length of output: 1140
🏁 Script executed:
# Check how descriptor modules are loaded - search for registration/loading patterns
rg -n 'qwen2_model_descriptor|register.*descriptor' modelopt/torch/puzzletron -C 3Repository: NVIDIA/Model-Optimizer
Length of output: 2186
Use lazy imports to gate optional transformers dependency in this descriptor module.
Lines 24–28 eagerly import Qwen2 classes at module level, making the descriptor fragile in environments without transformers installed. Per coding guidelines, optional dependencies must use lazy imports.
Move imports to function scope:
- Use
TYPE_CHECKINGguard for type hints in method signatures - Import class at each usage site where it's instantiated or returned
This pattern aligns with the codebase's import_plugin() approach used throughout optional integrations.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.py`
around lines 24 - 28, The module currently does eager, top-level imports of
Qwen2DecoderLayer, Qwen2ForCausalLM, and Qwen2RotaryEmbedding which breaks when
transformers is absent; change to lazy imports by removing those module-level
imports and instead: add a TYPE_CHECKING guard to import those types for type
hints only, and import the actual classes inside the functions/methods that
instantiate or return them (e.g., wherever Qwen2ForCausalLM, Qwen2DecoderLayer,
or Qwen2RotaryEmbedding are used in this descriptor), or use the repository's
import_plugin() pattern to load transformers at call time so the optional
dependency is gated at runtime. Ensure all usage sites reference the locally
imported symbols and keep type hints under TYPE_CHECKING.
| from transformers.models.qwen3_vl_moe.modeling_qwen3_vl_moe import ( | ||
| Qwen3VLMoeTextDecoderLayer, | ||
| Qwen3VLMoeTextRotaryEmbedding, | ||
| Qwen3VLMoeVisionRotaryEmbedding, | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does transformers library version 4.56 support Qwen3VLMoe model with module path transformers.models.qwen3_vl_moe?
💡 Result:
No. In the Transformers 4.56.x line, the transformers.models.qwen3_vl_moe package/module does not exist—src/transformers/models/qwen3_vl_moe was added in 4.57.0 (it appears in the 4.56.2 → 4.57.0 diff as a newly added directory). [1] The 4.57.0 release notes also list Qwen3-related additions, consistent with support landing in 4.57+. [2]
So you need Transformers >= 4.57.0 for that module path/model support. [1][3]
Sources: [1] [2] [3]
🏁 Script executed:
# Find dependency configuration files
git ls-files | grep -E '(pyproject\.toml|requirements|setup\.py)' | head -20Repository: NVIDIA/Model-Optimizer
Length of output: 867
🏁 Script executed:
# Check for main project config files at root
ls -la pyproject.toml setup.py setup.cfg 2>/dev/null || echo "Checking for them..."
# Also search for pyproject.toml anywhere
find . -maxdepth 2 -name "pyproject.toml" -type fRepository: NVIDIA/Model-Optimizer
Length of output: 167
🏁 Script executed:
# Check the transformers dependency version in pyproject.toml
cat pyproject.toml | grep -A 5 -B 5 "transformers"Repository: NVIDIA/Model-Optimizer
Length of output: 578
🏁 Script executed:
# Check the full file to see if imports are at module level and how the descriptor is registered
cat -n modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_model_descriptor.py | head -50Repository: NVIDIA/Model-Optimizer
Length of output: 2179
🏁 Script executed:
# Check if this descriptor module is imported or registered anywhere
rg "qwen3_vl_model_descriptor\|Qwen3VLModelDescriptor" --type py | head -20Repository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Check how model descriptors are discovered/registered
find modelopt/torch/puzzletron/anymodel/models -name "__init__.py" -exec cat {} \;Repository: NVIDIA/Model-Optimizer
Length of output: 8632
🏁 Script executed:
# Check what's in qwen3_vl __init__.py to confirm eager import
cat modelopt/torch/puzzletron/anymodel/models/qwen3_vl/__init__.pyRepository: NVIDIA/Model-Optimizer
Length of output: 965
Update minimum transformers version or use lazy imports for Qwen3-VL MoE imports.
The import at lines 22-26 uses transformers.models.qwen3_vl_moe.modeling_qwen3_vl_moe, which was added in transformers 4.57.0. However, pyproject.toml specifies transformers>=4.56,<5.0, allowing 4.56.x where this module does not exist. This will cause an immediate ModuleNotFoundError when the module is loaded.
Either update pyproject.toml to require transformers>=4.57.0 or wrap these imports in a try/except block with lazy loading to handle both versions gracefully.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_model_descriptor.py`
around lines 22 - 26, The module-level imports of Qwen3VLMoeTextDecoderLayer,
Qwen3VLMoeTextRotaryEmbedding, and Qwen3VLMoeVisionRotaryEmbedding will raise
ModuleNotFoundError on transformers 4.56.x; either bump the transformers
requirement to >=4.57.0 in pyproject.toml, or change the top-level import in
qwen3_vl_model_descriptor.py to a guarded lazy import: wrap the current import
block for
Qwen3VLMoeTextDecoderLayer/Qwen3VLMoeTextRotaryEmbedding/Qwen3VLMoeVisionRotaryEmbedding
in try/except ImportError and fall back to importing alternatives or deferring
import inside the functions/classes that use them so the module can load under
older transformers versions.
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
| # mypy: ignore-errors |
There was a problem hiding this comment.
Avoid file-wide mypy: ignore-errors.
This disables type checking for the entire module and makes it much easier to miss bugs in a new shared utility surface. Prefer targeted # type: ignore[...] on the specific Torch internals that mypy cannot model.
As per coding guidelines, "Ensure type hints are properly annotated for static type checking with mypy".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/sewing_kit/utils.py` at line 15, Remove the
file-wide directive "# mypy: ignore-errors" at the top of the sewing_kit.utils
module; instead run mypy to locate the specific type errors, add precise type
annotations to public functions/classes in this module, and apply
narrowly-scoped ignores (e.g., "# type: ignore[assignment]" or "# type:
ignore[arg-type]") only on the specific Torch internals or import lines that
mypy cannot model; ensure exported symbols have explicit return/argument types
so the module is type-checkable without disabling mypy globally.
| def __init__(self, max_depth: Optional[int] = None, no_duplicates=False, reversed=False): | ||
| self.activity_stack: list[T] = [] | ||
| self.max_depth = max_depth | ||
| self.no_duplicates = no_duplicates | ||
| self.reversed = reversed | ||
| self.head_index = 0 if self.reversed else -1 | ||
|
|
||
| def __contains__(self, value: T) -> bool: | ||
| result = value in self.activity_stack | ||
| return result | ||
|
|
||
| def __call__(self, value: T) -> ContextManager: | ||
| @contextmanager | ||
| def fn(): | ||
| try: | ||
| if self.no_duplicates and value in self.activity_stack: | ||
| raise ActivityContextDuplicateException( | ||
| f"Activity stack cannot have a duplicate of item {value}" | ||
| ) | ||
|
|
||
| self.activity_stack.insert(self.head_index, value) | ||
|
|
||
| if self.max_depth is not None and len(self) > self.max_depth: | ||
| raise ActivityContextMaxDepthException( | ||
| f"Activity stack exceeds max depth of {self.max_depth}" | ||
| ) | ||
|
|
||
| yield | ||
| finally: | ||
| assert self.is_active() | ||
| self.activity_stack.pop(self.head_index) | ||
|
|
||
| return fn() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -name "utils.py" -path "*/puzzletron/sewing_kit/*" -type fRepository: NVIDIA/Model-Optimizer
Length of output: 113
🏁 Script executed:
cat -n "modelopt/torch/puzzletron/sewing_kit/utils.py" | head -160Repository: NVIDIA/Model-Optimizer
Length of output: 5831
🏁 Script executed:
grep -n "get_active" "modelopt/torch/puzzletron/sewing_kit/utils.py"Repository: NVIDIA/Model-Optimizer
Length of output: 110
🏁 Script executed:
find . -name "core.py" -path "*/passage/*" -type fRepository: NVIDIA/Model-Optimizer
Length of output: 120
🏁 Script executed:
grep -n "ActivityContext\|max_depth\|no_duplicates" "modelopt/torch/puzzletron/sewing_kit/passage/core.py" | head -20Repository: NVIDIA/Model-Optimizer
Length of output: 248
🏁 Script executed:
rg -A 3 -B 3 "ActivityContext" "modelopt/torch/puzzletron/sewing_kit/passage/core.py"Repository: NVIDIA/Model-Optimizer
Length of output: 419
🏁 Script executed:
python3 << 'EOF'
# Test list.insert(-1, value) behavior
test_list = [1, 2, 3]
print("Original list:", test_list)
# Insert -1 with value 99
test_list_copy = test_list.copy()
test_list_copy.insert(-1, 99)
print("After insert(-1, 99):", test_list_copy)
print("Does insert(-1, value) insert BEFORE the last element? Yes")
# What append does
test_list_copy2 = test_list.copy()
test_list_copy2.append(99)
print("After append(99):", test_list_copy2)
# Test with default reversed=False (head_index = -1)
print("\n--- Simulating ActivityContext with reversed=False (head_index=-1) ---")
stack = []
stack.insert(-1, "A") # head_index=-1
print(f"After insert(-1, 'A'): {stack}")
stack.insert(-1, "B") # head_index=-1
print(f"After insert(-1, 'B'): {stack}")
stack.insert(-1, "C") # head_index=-1
print(f"After insert(-1, 'C'): {stack}")
print(f"pop(-1) removes: {stack.pop(-1)}")
print(f"Stack after pop(-1): {stack}")
print(f"pop(-1) removes: {stack.pop(-1)}")
print(f"Stack after pop(-1): {stack}")
# Show expected behavior with append/pop
print("\n--- Expected behavior (append, pop(-1)) ---")
stack2 = []
stack2.append("A")
print(f"After append('A'): {stack2}")
stack2.append("B")
print(f"After append('B'): {stack2}")
stack2.append("C")
print(f"After append('C'): {stack2}")
print(f"pop(-1) removes: {stack2.pop(-1)}")
print(f"Stack after pop(-1): {stack2}")
EOFRepository: NVIDIA/Model-Optimizer
Length of output: 674
🏁 Script executed:
grep -B 10 -A 10 "ActivityContext(" "modelopt/torch/puzzletron/sewing_kit/passage/core.py" | head -60Repository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
grep -n "ActivityContext\|max_depth\|no_duplicates" "modelopt/torch/puzzletron/sewing_kit/passage/core.py"Repository: NVIDIA/Model-Optimizer
Length of output: 248
Fix ActivityContext enter/exit bookkeeping — three critical bugs.
-
list.insert(-1, value)inserts before the last element, not at the end. Withreversed=False(default), the stack winds up as['B', 'C', 'A']instead of['A', 'B', 'C'], causing nested contexts to unwind in the wrong order. -
The
finallyblock unconditionally pops even whenActivityContextDuplicateExceptionis raised before the insert happens, removing an already-active entry that was never pushed in this call. -
Line 145:
if self.is_active:checks the method object without calling it, so the condition is always truthy. This meansget_active()never returnsNonefor empty stacks and instead attempts to index an empty list, raisingIndexError.
These issues manifest in passage/core.py where create_fn_context uses max_depth=1 and active_passages_context uses no_duplicates=True.
Suggested fix
class ActivityContext(Generic[T]):
def __call__(self, value: T) -> ContextManager:
`@contextmanager`
def fn():
+ inserted = False
try:
if self.no_duplicates and value in self.activity_stack:
raise ActivityContextDuplicateException(
f"Activity stack cannot have a duplicate of item {value}"
)
- self.activity_stack.insert(self.head_index, value)
+ if self.reversed:
+ self.activity_stack.insert(0, value)
+ else:
+ self.activity_stack.append(value)
+ inserted = True
if self.max_depth is not None and len(self) > self.max_depth:
raise ActivityContextMaxDepthException(
f"Activity stack exceeds max depth of {self.max_depth}"
)
yield
finally:
- assert self.is_active()
- self.activity_stack.pop(self.head_index)
+ if inserted:
+ assert self.is_active()
+ self.activity_stack.pop(0 if self.reversed else -1)
return fn()
@@
def get_active(self) -> Optional[T]:
- if self.is_active:
+ if self.is_active():
return self.activity_stack[-1]
- else:
- return None
+ return None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/sewing_kit/utils.py` around lines 94 - 126, Fix
three bugs in ActivityContext: in __init__/__call__ stop using
insert(self.head_index) for normal order — for reversed=True insert at index 0,
otherwise use list.append to push to the end so nested contexts unwind LIFO; in
__call__ only pop when this call actually pushed an item (set a local flag
pushed=False, set pushed=True after insert/append, and in finally only pop if
pushed using pop(0) for reversed or pop() for normal); and correct the is_active
check elsewhere to call the method (use self.is_active() not self.is_active) so
get_active() can return None for empty stacks. Ensure references changed affect
__init__, __call__, is_active, and get_active.
| def _get_device_for_distributed( | ||
| group: Optional[torch.distributed.ProcessGroup] = None, | ||
| ) -> str: | ||
| """ | ||
| Determine the appropriate device for distributed communication based on the backend. | ||
| NCCL backend requires CUDA tensors, while Gloo supports both CPU and CUDA. | ||
| """ | ||
| if not torch.distributed.is_initialized(): | ||
| return "cpu" | ||
|
|
||
| backend = torch.distributed.get_backend(group) | ||
| if backend == "nccl": | ||
| # NCCL requires CUDA tensors | ||
| return torch.cuda.current_device() | ||
| else: | ||
| # Gloo and other backends support CPU tensors | ||
| return "cpu" |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate and examine the function
find . -name "utils.py" -path "*/puzzletron/sewing_kit/*" | head -5Repository: NVIDIA/Model-Optimizer
Length of output: 113
🏁 Script executed:
# Search for the file containing the function
find . -type f -name "utils.py" | xargs grep -l "_get_device_for_distributed" 2>/dev/nullRepository: NVIDIA/Model-Optimizer
Length of output: 113
🏁 Script executed:
# Also check the entire puzzletron directory structure
find . -type d -name "puzzletron" 2>/dev/nullRepository: NVIDIA/Model-Optimizer
Length of output: 209
🏁 Script executed:
# Examine the function implementation
cat -n ./modelopt/torch/puzzletron/sewing_kit/utils.py | sed -n '350,380p'Repository: NVIDIA/Model-Optimizer
Length of output: 1165
🏁 Script executed:
# Find all callers of _get_device_for_distributed
rg "_get_device_for_distributed" -B 2 -A 5 --no-headingRepository: NVIDIA/Model-Optimizer
Length of output: 2217
🏁 Script executed:
# Search for distributed_recv_obj and _object_to_tensor to see how the device is used
rg "distributed_recv_obj|_object_to_tensor" -B 2 -A 10 --no-heading -t pyRepository: NVIDIA/Model-Optimizer
Length of output: 10214
🏁 Script executed:
# Let me check the exact line where .to(device) is called to understand the context better
cat -n ./modelopt/torch/puzzletron/sewing_kit/utils.py | sed -n '395,415p'Repository: NVIDIA/Model-Optimizer
Length of output: 724
🌐 Web query:
PyTorch torch.cuda.current_device return type and Tensor.to() device parameter documentation
💡 Result:
-
torch.cuda.current_device()return type: It returns the index of the currently selected CUDA device as a Pythonint. [1] -
Tensor.to()“device” argument (what you can pass):Tensor.to()supports a call formtorch.to(device=None, dtype=None, ...), which returns a tensor on the specified device (and optionally dtype). [2]- For the device you can pass a
torch.device, a properly formatted device string (e.g."cuda:1","cpu","mps"), or an integer device ordinal (interpreted as the current accelerator type’s device index). [3]
Sources: PyTorch docs for torch.cuda.current_device() [1], torch.Tensor.to() [2], and torch.device / device argument conventions [3].
Fix the function return type and use consistent torch.device objects.
The function currently has an inconsistent return type—it returns a string "cpu" in some branches but torch.cuda.current_device() (an integer) in others. While PyTorch's Tensor.to() and _object_to_tensor() technically accept integer device indices, the documented and preferred approach is to pass torch.device objects for clarity and consistency. The return type annotation -> str is also incorrect. Update the function to return torch.device objects consistently.
🚚 Suggested fix
def _get_device_for_distributed(
group: Optional[torch.distributed.ProcessGroup] = None,
-) -> str:
+) -> torch.device:
"""
Determine the appropriate device for distributed communication based on the backend.
NCCL backend requires CUDA tensors, while Gloo supports both CPU and CUDA.
"""
if not torch.distributed.is_initialized():
- return "cpu"
+ return torch.device("cpu")
backend = torch.distributed.get_backend(group)
if backend == "nccl":
# NCCL requires CUDA tensors
- return torch.cuda.current_device()
+ return torch.device("cuda", torch.cuda.current_device())
else:
# Gloo and other backends support CPU tensors
- return "cpu"
+ return torch.device("cpu")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/sewing_kit/utils.py` around lines 357 - 373, Update
_get_device_for_distributed to consistently return torch.device objects (change
the return type annotation from -> str to -> torch.device); when not initialized
return torch.device("cpu"), and when backend is "nccl" return
torch.device("cuda", torch.cuda.current_device()) (or torch.device("cuda",
index) using torch.cuda.current_device()) so both branches return torch.device
instances and usage with Tensor.to() is consistent.
modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (8)
modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py (2)
60-61: Confusing TypeVar redefinition.
T_DataClassis defined as aTypeVarat line 61, then shadowed at line 482 with a type annotation. The line 482 syntaxT_DataClass: TypeVar = Type[dataclasses.dataclass]is confusing—it annotatesT_DataClassas having typeTypeVarbut assigns aTypevalue. If line 482 is intended as a type alias, consider usingTypeAlias:♻️ Proposed fix
-T_DataClass: TypeVar = Type[dataclasses.dataclass] +# Type alias for dataclass types (used in _dataclass_from_dict) +DataClassType = Type[T_DataClass]Or simply remove line 482 and use the existing
T_DataClassTypeVar with the function's generic signature.Also applies to: 482-482
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py` around lines 60 - 61, T_DataClass is defined as a TypeVar at the top and then re-used incorrectly at the later declaration `T_DataClass: TypeVar = Type[dataclasses.dataclass]`, which shadows and confuses the type system; remove the second declaration and either (A) use the original TypeVar `T_DataClass` in the generic function signatures that need it, or (B) if you meant a type alias for dataclass types, replace the second declaration with a proper alias using `TypeAlias` (e.g., `T_DataClass: TypeAlias = Type[dataclasses.dataclass]`). Ensure all functions referencing `T_DataClass` (the generic parameter) import and use the original TypeVar instead of the mistaken reassignment.
28-28: Module-level side effect may cause unexpected behavior.Setting
os.environ["PYTORCH_CUDA_ALLOC_CONF"]at import time affects all subsequent PyTorch CUDA allocations globally, even for code that only imports this module without using it. Consider moving this to within the functions that require it, or making it configurable.♻️ Proposed refactor
-os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True" - import pandas as pd import torchThen set it within
calculate_subblock_stats_for_puzzle_dirorlaunch_calc_subblock_statsif needed:def launch_calc_subblock_stats(cfg: DictConfig) -> None: os.environ.setdefault("PYTORCH_CUDA_ALLOC_CONF", "expandable_segments:True") # ... rest of function🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py` at line 28, The module currently sets os.environ["PYTORCH_CUDA_ALLOC_CONF"] at import time which produces a global side effect; move this environment change into the runtime path that actually needs it (e.g., inside launch_calc_subblock_stats or calculate_subblock_stats_for_puzzle_dir) and use os.environ.setdefault or assign it at the start of that function so importing the module no longer mutates global process state; update references to the top-level os.environ assignment (the current module-level line) by removing it and adding the environment setup at the start of the specified functions (launch_calc_subblock_stats / calculate_subblock_stats_for_puzzle_dir).modelopt/torch/puzzletron/mip/run_puzzle.py (1)
509-529: Typo in function name: "puzle" should be "puzzle".The function
gather_multi_layer_puzle_metricsis missing a 'z' in "puzzle". This inconsistency may cause confusion.♻️ Proposed fix
-def gather_multi_layer_puzle_metrics( +def gather_multi_layer_puzzle_metrics( single_replacement_validation_dir: Path, ) -> MultiLayerPuzzleMetrics:Also update the call site at line 450:
- gathered_metrics = gather_multi_layer_puzle_metrics( + gathered_metrics = gather_multi_layer_puzzle_metrics( args.single_block_replacement_validation_dir )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/mip/run_puzzle.py` around lines 509 - 529, Function name has a typo: rename gather_multi_layer_puzle_metrics to gather_multi_layer_puzzle_metrics; update the function definition and all call sites that reference gather_multi_layer_puzle_metrics to use the corrected name, and adjust any imports/exports or type annotations (e.g., MultiLayerPuzzleMetrics) that reference the old symbol so references remain consistent.modelopt/torch/puzzletron/tools/logger.py (2)
89-100: Stack frame depth is hardcoded and fragile.The
get_caller_location()relies on exactly 3f_backcalls to reach the original caller. If the helper functions (aprint,mprint, etc.) are refactored or ifdist_logis called differently, this will silently report the wrong location.Consider making the depth configurable or using
inspect.stack()with a known function name anchor.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/tools/logger.py` around lines 89 - 100, get_caller_location currently walks a fixed 3-frame chain which is fragile; update get_caller_location to inspect the full call stack (e.g., via inspect.stack()) and locate the first frame that is not inside the logger utilities (exclude function names like "get_caller_location", "aprint", "mprint", "dist_log" and this module's filename) or accept an optional depth parameter, then return that frame's filename:lineno; modify references to get_caller_location accordingly so caller location resolution is robust to refactors.
103-112: Global logger class mutation at import time.Calling
logging.setLoggerClass(DistributedLogger)affects all subsequentgetLogger()calls process-wide, which may surprise users who import this module indirectly. This is intentional based on the design (relevant snippet showsdistillation_provider.pyrelies on it), but worth documenting in the module docstring.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/tools/logger.py` around lines 103 - 112, The module currently calls logging.setLoggerClass(DistributedLogger) at import time which mutates the global logger class for the whole process; update the module-level docstring to explicitly state this side-effect and its implications (mention DistributedLogger, logging.setLoggerClass, and that subsequent logging.getLogger() calls will use the custom class), describing why this choice was made and pointing callers to an alternative (e.g., an explicit init function) if they need to avoid the global mutation.modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py (3)
263-263: Use explicit exception instead ofassertfor input validation.
assertstatements can be disabled withpython -O. For runtime input validation, raise an explicit exception.Proposed fix
- assert solutions_path.exists(), f"{solutions_path=} does not exist" + if not solutions_path.exists(): + raise FileNotFoundError(f"{solutions_path=} does not exist")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py` at line 263, Replace the assert used for input validation (the line referencing solutions_path) with an explicit exception: check whether solutions_path.exists() and if not raise a FileNotFoundError (or ValueError) with a clear message that includes solutions_path; update the code around the validation (the block using solutions_path) to use this explicit raise so the check cannot be skipped by running Python with optimizations.
284-284: Usemprintinstead ofThis file imports and uses distributed utilities but uses bare
print()here. For consistency with the rest of the Puzzletron codebase, consider usingmprintfrom the logger module to ensure only the main rank prints.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py` at line 284, Replace the bare print call that logs sorted solutions (the line printing f"sorted solutions by {sort_solutions_by}. {vals[:10]=} {vals[-10:]=}") with the distributed-aware logger mprint so only main rank prints; import mprint from the logger module if it's not already imported and call mprint with the same formatted message, referencing the existing variables sort_solutions_by and vals to preserve output content and context.
188-191: TODO comment for symlink checkpoint saving.There's a TODO indicating that
save_checkpoint_as_symlinksis not currently supported.Would you like me to open an issue to track implementing symlink-based checkpoint saving?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py` around lines 188 - 191, The TODO about symlink-based checkpoint saving indicates missing work around save_checkpoint_as_symlinks inside the dist.is_master() branch; open a tracking issue titled "Support symlink checkpoint saving in Puzzletron" that references save_checkpoint_as_symlinks and the checkpoint save flow used in validate_puzzle_with_multi_replacements.py (the dist.is_master() block) and documents expected behavior, platforms, and possible implementation approaches; meanwhile, replace the silent pass with an explicit runtime-visible note (e.g., a processLogger.warning or raising NotImplementedError) and add a TODO comment that references the new issue ID so callers know symlink saving is intentionally unsupported for now.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/torch/prune/importance_hooks/compare_module_outputs.py`:
- Around line 184-203: The --trust-inputs flag is inverted causing unsafe
default loading; update the two torch.load calls in compare_module_outputs.py
(the ref_data and comp_data loads) to pass weights_only=not args.trust_inputs so
that by default weights_only=True (safe) and when the user supplies
--trust-inputs it becomes weights_only=False (trusted), and adjust the
parser.add_argument help text if needed to reflect this semantics.
In `@modelopt/torch/puzzletron/mip/run_puzzle.py`:
- Around line 408-412: Typo in the error string: update the message in the
run_puzzle.py block that checks for "subblock_stats_args" (the if that
references args and puzzle_profile and calls mprint/sys.exit) to refer to
"subblock_stats_args" instead of the incorrect "subblock_stats_arrs"; locate the
check using the identifiers args, puzzle_profile, and mprint and change only the
quoted error text to the correct key name.
In `@modelopt/torch/puzzletron/tools/checkpoint_utils.py`:
- Around line 107-118: The function init_empty_module changes global dtype via
torch.set_default_dtype which is not thread-safe; protect the set/restore and
the call to skip_init by introducing a module-level threading.Lock (e.g.,
_default_dtype_lock) and wrap the torch.set_default_dtype(...) / skip_init(...)
/ restore sequence in a with _default_dtype_lock: block (use try/finally to
ensure the original dtype is restored), so concurrent calls cannot interleave;
reference init_empty_module, torch.set_default_dtype, skip_init, and the new
_default_dtype_lock in your change.
- Around line 146-156: The code accesses model_config.block_configs directly
which can raise AttributeError if that attribute is absent; change the check to
safely query the attribute (e.g., use getattr(model_config, "block_configs",
None) or hasattr(model_config, "block_configs")) in the block inside the try in
the function that calls load_model_config (referencing model_config,
checkpoint_dir, trust_remote_code), warn and return False when the attribute is
missing instead of allowing an exception to be raised; keep the existing warning
text but ensure it triggers when the attribute is absent.
In `@modelopt/torch/puzzletron/tools/logger.py`:
- Around line 62-66: The NotImplementedError raised when validating the ranks
parameter incorrectly omits "last" from the displayed allowed choices; update
the error message in the validation block that checks variable ranks (the raise
NotImplementedError(...) near the ranks check) to include "last" in the list of
valid choices (e.g., "['all', 'main', 'local_main', 'last']") or, better,
reference a single ALLOWED_RANKS constant used for the check to keep the message
and the validation in sync.
In `@modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py`:
- Around line 241-242: There is a bare "except:" in the try/except block in
validate_puzzle_with_multi_replacements.py which will catch BaseException
subclasses (e.g., KeyboardInterrupt/SystemExit); replace "except:" with "except
Exception:" to only catch regular exceptions (keep the existing "pass" or add
minimal handling/logging if desired) so that system-level exceptions can
propagate; locate the try/except around the puzzle validation logic (the bare
except at lines shown) and make this single-line change.
- Around line 234-245: The _load_tokenizer function currently hardcodes
trust_remote_code=True creating an RCE risk; change its signature to accept a
trust_remote_code: bool = False parameter and pass that parameter into both
AutoTokenizer.from_pretrained calls (replace the hardcoded True), and update any
callers (the code that invokes _load_tokenizer) to pass through the intended
trust_remote_code value (defaulting to False) so callers can opt in explicitly
when loading tokenizers; reference function name _load_tokenizer and the
AutoTokenizer.from_pretrained invocations to locate the changes.
---
Nitpick comments:
In `@modelopt/torch/puzzletron/mip/run_puzzle.py`:
- Around line 509-529: Function name has a typo: rename
gather_multi_layer_puzle_metrics to gather_multi_layer_puzzle_metrics; update
the function definition and all call sites that reference
gather_multi_layer_puzle_metrics to use the corrected name, and adjust any
imports/exports or type annotations (e.g., MultiLayerPuzzleMetrics) that
reference the old symbol so references remain consistent.
In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py`:
- Around line 60-61: T_DataClass is defined as a TypeVar at the top and then
re-used incorrectly at the later declaration `T_DataClass: TypeVar =
Type[dataclasses.dataclass]`, which shadows and confuses the type system; remove
the second declaration and either (A) use the original TypeVar `T_DataClass` in
the generic function signatures that need it, or (B) if you meant a type alias
for dataclass types, replace the second declaration with a proper alias using
`TypeAlias` (e.g., `T_DataClass: TypeAlias = Type[dataclasses.dataclass]`).
Ensure all functions referencing `T_DataClass` (the generic parameter) import
and use the original TypeVar instead of the mistaken reassignment.
- Line 28: The module currently sets os.environ["PYTORCH_CUDA_ALLOC_CONF"] at
import time which produces a global side effect; move this environment change
into the runtime path that actually needs it (e.g., inside
launch_calc_subblock_stats or calculate_subblock_stats_for_puzzle_dir) and use
os.environ.setdefault or assign it at the start of that function so importing
the module no longer mutates global process state; update references to the
top-level os.environ assignment (the current module-level line) by removing it
and adding the environment setup at the start of the specified functions
(launch_calc_subblock_stats / calculate_subblock_stats_for_puzzle_dir).
In `@modelopt/torch/puzzletron/tools/logger.py`:
- Around line 89-100: get_caller_location currently walks a fixed 3-frame chain
which is fragile; update get_caller_location to inspect the full call stack
(e.g., via inspect.stack()) and locate the first frame that is not inside the
logger utilities (exclude function names like "get_caller_location", "aprint",
"mprint", "dist_log" and this module's filename) or accept an optional depth
parameter, then return that frame's filename:lineno; modify references to
get_caller_location accordingly so caller location resolution is robust to
refactors.
- Around line 103-112: The module currently calls
logging.setLoggerClass(DistributedLogger) at import time which mutates the
global logger class for the whole process; update the module-level docstring to
explicitly state this side-effect and its implications (mention
DistributedLogger, logging.setLoggerClass, and that subsequent
logging.getLogger() calls will use the custom class), describing why this choice
was made and pointing callers to an alternative (e.g., an explicit init
function) if they need to avoid the global mutation.
In `@modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py`:
- Line 263: Replace the assert used for input validation (the line referencing
solutions_path) with an explicit exception: check whether
solutions_path.exists() and if not raise a FileNotFoundError (or ValueError)
with a clear message that includes solutions_path; update the code around the
validation (the block using solutions_path) to use this explicit raise so the
check cannot be skipped by running Python with optimizations.
- Line 284: Replace the bare print call that logs sorted solutions (the line
printing f"sorted solutions by {sort_solutions_by}. {vals[:10]=} {vals[-10:]=}")
with the distributed-aware logger mprint so only main rank prints; import mprint
from the logger module if it's not already imported and call mprint with the
same formatted message, referencing the existing variables sort_solutions_by and
vals to preserve output content and context.
- Around line 188-191: The TODO about symlink-based checkpoint saving indicates
missing work around save_checkpoint_as_symlinks inside the dist.is_master()
branch; open a tracking issue titled "Support symlink checkpoint saving in
Puzzletron" that references save_checkpoint_as_symlinks and the checkpoint save
flow used in validate_puzzle_with_multi_replacements.py (the dist.is_master()
block) and documents expected behavior, platforms, and possible implementation
approaches; meanwhile, replace the silent pass with an explicit runtime-visible
note (e.g., a processLogger.warning or raising NotImplementedError) and add a
TODO comment that references the new issue ID so callers know symlink saving is
intentionally unsupported for now.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: cb7e3231-2259-4883-8ea7-c1ebb8e0538a
📒 Files selected for processing (10)
modelopt/torch/prune/importance_hooks/base_hooks.pymodelopt/torch/prune/importance_hooks/compare_module_outputs.pymodelopt/torch/puzzletron/anymodel/models/qwen3/qwen3_converter.pymodelopt/torch/puzzletron/mip/run_puzzle.pymodelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.pymodelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.pymodelopt/torch/puzzletron/tools/checkpoint_utils.pymodelopt/torch/puzzletron/tools/checkpoint_utils_hf.pymodelopt/torch/puzzletron/tools/logger.pymodelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py
✅ Files skipped from review due to trivial changes (1)
- modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py
🚧 Files skipped from review as they are similar to previous changes (2)
- modelopt/torch/puzzletron/anymodel/models/qwen3/qwen3_converter.py
- modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py
modelopt/torch/prune/importance_hooks/compare_module_outputs.py
Outdated
Show resolved
Hide resolved
| def init_empty_module( | ||
| module_cls: type[NNModule], | ||
| dtype: torch.dtype, | ||
| *init_args, | ||
| **init_kwargs, | ||
| ) -> NNModule: | ||
| default_dtype = torch.get_default_dtype() | ||
| current_device = torch.ones(1).device | ||
| torch.set_default_dtype(dtype) | ||
| module = skip_init(module_cls, *init_args, device=current_device, **init_kwargs) | ||
| torch.set_default_dtype(default_dtype) | ||
| return module |
There was a problem hiding this comment.
Global default dtype manipulation is not thread-safe.
torch.set_default_dtype() modifies global state. If multiple threads call init_empty_module concurrently, they may interfere with each other's dtype settings. Consider using a lock or torch.set_default_dtype as a context manager if available in the target PyTorch version.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/tools/checkpoint_utils.py` around lines 107 - 118,
The function init_empty_module changes global dtype via torch.set_default_dtype
which is not thread-safe; protect the set/restore and the call to skip_init by
introducing a module-level threading.Lock (e.g., _default_dtype_lock) and wrap
the torch.set_default_dtype(...) / skip_init(...) / restore sequence in a with
_default_dtype_lock: block (use try/finally to ensure the original dtype is
restored), so concurrent calls cannot interleave; reference init_empty_module,
torch.set_default_dtype, skip_init, and the new _default_dtype_lock in your
change.
modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py
Outdated
Show resolved
Hide resolved
modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
cjluo-nv
left a comment
There was a problem hiding this comment.
Summary
PR #1121 introduces Puzzletron, an end-to-end heterogeneous LLM compression framework based on the Puzzle paper (arxiv 2411.19146). It adds a 6-step pipeline (activation scoring → pruning → library building → scoring → MIP optimization → model realization) with support for 9 model architectures across 8 families. The PR includes an AnyModel abstraction layer (Converter/ModelDescriptor factories), a Sewing Kit for distributed module graph stitching, MIP-based neural architecture search, expert removal pruning for MoE models, and bypassed training with multi-threaded child initialization. This is a large addition (~94 files) consolidating 18+ sub-PRs.
Issues Found
[Correctness] Thread safety concern in child_init.py shared dictionary mutation
File: modelopt/torch/puzzletron/tools/bypassed_training/child_init.py lines 685-697
Workers in a ThreadPoolExecutor call keys.pop() on a shared dictionary. Each thread processes a different layer_idx, so different key prefixes, meaning keys don't overlap in practice. However, dict.pop() is not guaranteed thread-safe in CPython for concurrent mutation, and nothing in the code documents or enforces this invariant. A future refactor adding overlapping keys or reordering layers could introduce silent data corruption.
Suggestion: Pass each worker a pre-filtered copy of keys, or use a lock around shared dict mutation.
[Correctness] default_output_reducer_fn ignores accumulator — trap for custom reducers
File: modelopt/torch/puzzletron/sewing_kit/core.py lines 92, 460-468
The output reduction loop passes acc=None on the first iteration. The default reducer (default_output_reducer_fn) simply returns input_override, ignoring acc entirely, so it works. But any custom reducer that tries to accumulate (the natural interpretation of acc) will fail on the first call when acc is None. This is a latent correctness trap.
Suggestion: Initialize acc to a well-defined identity value, or document that reducers must handle acc=None on first call.
[Readability] print() instead of mprint() in distributed code
File: modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py line 299
print(f"{dist.rank()} loaded state_dict shard")All other logging in this module uses mprint() (rank-0 only). This print() fires on every rank, producing noisy output in multi-GPU runs.
[Readability] Unresolved TODO in main entry point
File: modelopt/torch/puzzletron/puzzletron.py line 45
# TODO @colunga: fix this, Hydra config should be immutable after this pointThe main pipeline entry point mutates the config object after initialization. This should be resolved before merge or converted to a tracked issue.
[Duplicated Code] Redundant helper functions in MIP solver
File: modelopt/torch/puzzletron/mip/run_puzzle.py lines 728-755
_none_add and _none_max duplicate the logic of _none_add_list and _none_max_list (which handle the single-element case). The scalar versions could be replaced with calls to the list versions.
[Tests] Very limited unit test coverage for high-risk modules
Files: tests/unit/torch/puzzletron/test_convert_anymodel.py (only unit test), tests/gpu/torch/puzzletron/test_puzzletron.py (E2E only)
The test suite has strong E2E coverage (9 model architectures with regression values), but zero unit tests for:
- Sewing Kit (
core.py, 883 lines) — the distributed stitching framework - Child initialization (
child_init.py, 1181 lines) — the largest file with thread-pool logic - Replacement library building (
build_replacement_library.py, 616 lines) - MIP solver (
run_puzzle.py, 760 lines) - Sharded checkpoint utilities (
sharded_checkpoint_utils.py, 404 lines)
The only unit test (test_convert_anymodel.py) covers a single model family (Qwen3). E2E GPU tests catch integration regressions but won't isolate bugs in individual components. Given the complexity of modules like sewing_kit and child_init, this is a significant gap.
[Readability] Wildcard-style broad imports in several modules
Multiple files use from module import * or import large numbers of symbols without clear need. This makes it harder to trace dependencies across the 94-file addition.
Suggestions
- Add unit tests for sewing_kit, child_init, and MIP solver — even basic tests for the reducer loop, thread-pool key partitioning, and None-handling helpers would catch the latent issues identified above.
- Replace shared dict mutation in child_init.py with per-worker key sets to eliminate the thread-safety concern entirely.
- Resolve or file the TODO at
puzzletron.py:45— config immutability is important for reproducibility in distributed settings. - Standardize logging — replace the stray
print()withmprint()insharded_checkpoint_utils.py.
Overall Assessment
This is an impressive, well-structured compression framework with clean factory abstractions and strong E2E test coverage across 9 model architectures. The AnyModel Converter/Descriptor pattern is extensible, and the pipeline orchestration in puzzletron.py is clear. The main concerns are: (1) a thread-safety design fragility in the largest file, (2) a latent correctness trap in the sewing kit's reducer interface, and (3) the near-absence of unit tests for complex core modules. None of these are showstoppers for the current code paths, but they create risk for future maintenance. The E2E tests provide a strong safety net, and the codebase is well-organized for its size.
…and hf export (#1122) - In nemo:26.02.01 container, we have DistillationProvider fix for homogeneous models already. That seems sufficient for Heterogeneous models as well so removing copied DistillationProvider to simplify - Replace hacky megatron to hf export logic with simplified one <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Refactor** * Reworked distillation and HuggingFace export flow to use upstream bridge/export APIs, removed local monkey-patching and extra exception logging, and simplified distributed cleanup. * **Chores** * Consolidated and renamed Qwen3 / Qwen3-VL model and converter registrations; updated pruning configs, CLI export flags, and packaging lint/dependency settings. * **Tests** * Updated integration tests to use Qwen3 checkpoints and adjusted export verification. * **Documentation** * Updated README example to reflect new CLI usage. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
What does this PR do?
Implement puzzletron compression algorithm based on Puzzle paper (https://arxiv.org/abs/2411.19146)
Th list of reviewed and merged MRs that resulted in the feature/puzzletron branch:
Merging dkorzekwa/any_model to feature/puzzletron
Add anymodel directories to feature/puzzletron by danielkorzekwa · Pull Request #974 · NVIDIA/Model-Optimizer - merged
Draft: anymodel activation scoring by danielkorzekwa · Pull Request #989 · NVIDIA/Model-Optimizer - merged
Draft: Merge anymodel pruning by danielkorzekwa · Pull Request #990 · NVIDIA/Model-Optimizer - merged
Draft: Merging anymodel:build_library_and_stats by danielkorzekwa · Pull Request #993 · NVIDIA/Model-Optimizer - merged
Dkorzekwa/any model calc one block scores by danielkorzekwa · Pull Request #994 · NVIDIA/Model-Optimizer - merged
Draft: merge any_model: mip_and_realize_models by danielkorzekwa · Pull Request #995 · NVIDIA/Model-Optimizer - merged
Dkorzekwa/any model other modeqls by danielkorztiekwa · Pull Request #1007 · NVIDIA/Model-Optimizer - merged
PR to 1007: #1039 - merged
Dkorzekwa/anymodel gptoss by danielkorzekwa · Pull Request #1020 · NVIDIA/Model-Optimizer - merged
Merge any_model tutorial by danielkorzekwa · Pull Request #1035 · NVIDIA/Model-Optimizer - merged
Merge mbridge distillation for any_model by danielkorzekwa · Pull Request #1036 · NVIDIA/Model-Optimizer - merged
MR branch for the remaining difference between dkorzekwa/any_model an… by danielkorzekwa · Pull Request #1047 · NVIDIA/Model-Optimizer - merged
Dkorzekwa/decilm hf code cleanup by danielkorzekwa · Pull Request #1071 · NVIDIA/Model-Optimizer - merged
Dkorzekwa/decilm hf code cleanup 2 by danielkorzekwa · Pull Request #1073 · NVIDIA/Model-Optimizer - merged
Dkorzekwa/anymodel subblock stats by danielkorzekwa · Pull Request #1085 · NVIDIA/Model-Optimizer - merged
Dkorzekwa/anymodel subblock stats nodecilm by danielkorzekwa · Pull Request #1102 · NVIDIA/Model-Optimizer - merged
Dkorzekwa/decilm cleanup post subblockstats by danielkorzekwa · Pull Request #1103 · NVIDIA/Model-Optimizer - merged
code clean up by danielkorzekwa · Pull Request #1110 · NVIDIA/Model-Optimizer - merged
Merging into main:
Activation hooks redesign (reuse hooks component across both minitron and puzzletron) by danielkorzekwa · Pull Request #1022 · NVIDIA/Model-Optimizer - merged
Dkorzekwa/puzzletron use importance hooks from prune by danielkorzekwa · Pull Request #1115 · NVIDIA/Model-Optimizer - merged
Usage
Puzzletron tutorial:
https://github.com/NVIDIA/Model-Optimizer/tree/feature/puzzletron/examples/puzzletron
Testing
The main e2e test for compressing 9 models with Puzzletron:
https://github.com/NVIDIA/Model-Optimizer/blob/feature/puzzletron/tests/gpu/torch/puzzletron/test_puzzletron.py
Before your PR is "Ready for review"
CONTRIBUTING.md: ✅Summary by CodeRabbit